How to speed up batch delete of files & artifacts?

I have been concerned with trying to stay within the 100GB limit on files and artifact storage imposed by Wandb, so I have the idea to delete files & artifacts on old runs.

However, I do not want to delete all files on those runs! It is definitely useful to be able to see the progression of generated files over time. I don’t need to see all 50,000 or so logged steps on each run, but I’ll just keep 100 of them evenly spaced in time. so I programmed a script to do that by indexing all my files on Wandb using the Python API, grouping them, sorting them, and selecting files to delete.

My issue comes with how slow the current API seems to be to delete files & artifacts: Using File.delete, it takes around 2s per file. With hundreds of runs and tens of thousands of files per run, I am then looking at weeks of time needed to delete the files I need to delete.

I then tried to refactor my code into parallel workers, thinking I could increase that speed several fold, but I quickly ran into the 200 call/minute rate limit. It even started to affect my ongoing runs.

Is there any better way I could prune the files & artifacts so that I could have the process complete faster?

While I do not have the answer, I would like to bump this post and hope W&B could chime in with a solution.

I was also looking at deleting specific files from a run to keep within the quota, but also to avoid saving files that would never be used, e.g., models within a sweep that performed poorly.

The files are not logged as artifacts in my case.

Hope someone can help?
I note that there are other people requesting support with similar issues as seen in
[1] and [2]

Even after adding a mechanism to limit my deletions/s to 1, I was still getting regular errors from the wandb API. It is now running stably at 0.5 delete/s… ETA > 1 year :sweat_smile:

To me, this seems much slower than what wandb sync [...] is capable of doing… I wonder if the rate limiting counts that as just 1 API call, even if it uploads tens of thousands of files. I wonder if that could be my solution? Would wandb sync be able to delete online files if I delete them from a downloaded run locally, then sync the folder?

My testing cannot really proceed, since I am now blocked from even the wandb web console by “rate limit exceeded” error messages. I might wait a few hours (or days) and see if it disappears.

Hi @snobso and @aabywan , thank you both for writing in and providing your valuable feedback. This specific request hasn’t surfaced in a while, so the status of batch deletion or improvements in how API handles many file deletions isn’t changed. At this time the user:

  • Could delete an entire run and it’s files
  • Rate limit their calls for individual file deletions.

I filed a feature request with eng and will keep you updated once they’ve reviewed your request.

Thanks. Is that feature request anywhere public, so that I could vote for it?

1 Like

I’m not sure if this would apply to all use cases, but in my case, a RegEx-based or glob-like deletion API would be extremely helpful

Appreciate the feature request.

In my preferences, I would like to see the “files” option be more like a files explore where you can do the expected file management including move, delete, rename, copy, and so on similar to a file explore on operating systems.

Best,
Peter.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.