I have been concerned with trying to stay within the 100GB limit on files and artifact storage imposed by Wandb, so I have the idea to delete files & artifacts on old runs.
However, I do not want to delete all files on those runs! It is definitely useful to be able to see the progression of generated files over time. I don’t need to see all 50,000 or so logged steps on each run, but I’ll just keep 100 of them evenly spaced in time. so I programmed a script to do that by indexing all my files on Wandb using the Python API, grouping them, sorting them, and selecting files to delete.
My issue comes with how slow the current API seems to be to delete files & artifacts: Using File.delete, it takes around 2s per file. With hundreds of runs and tens of thousands of files per run, I am then looking at weeks of time needed to delete the files I need to delete.
I then tried to refactor my code into parallel workers, thinking I could increase that speed several fold, but I quickly ran into the 200 call/minute rate limit. It even started to affect my ongoing runs.
Is there any better way I could prune the files & artifacts so that I could have the process complete faster?
Even after adding a mechanism to limit my deletions/s to 1, I was still getting regular errors from the wandb API. It is now running stably at 0.5 delete/s… ETA > 1 year
To me, this seems much slower than what wandb sync [...] is capable of doing… I wonder if the rate limiting counts that as just 1 API call, even if it uploads tens of thousands of files. I wonder if that could be my solution? Would wandb sync be able to delete online files if I delete them from a downloaded run locally, then sync the folder?
My testing cannot really proceed, since I am now blocked from even the wandb web console by “rate limit exceeded” error messages. I might wait a few hours (or days) and see if it disappears.
Hi @snobso and @aabywan , thank you both for writing in and providing your valuable feedback. This specific request hasn’t surfaced in a while, so the status of batch deletion or improvements in how API handles many file deletions isn’t changed. At this time the user:
Could delete an entire run and it’s files
Rate limit their calls for individual file deletions.
I filed a feature request with eng and will keep you updated once they’ve reviewed your request.
In my preferences, I would like to see the “files” option be more like a files explore where you can do the expected file management including move, delete, rename, copy, and so on similar to a file explore on operating systems.