What are the files under "artifact/" for each run?

In an effort to comply with the 100GB storage limits, I have been looking closely at the “usage” page (https://wandb.ai/usage/your-username-here), and indexing, then deleting files using the export API. There is sometimes a significant discrepancy between the runs size reported by https://wandb.ai/usage/your-username-here/your-project-here/runs VS https://wandb.ai/usage/your-username-here/your-project-here/runs/your-run-here.

There is a set of files that are returned when using Run.files() call, but they are invisible when looking at the run on the web dashboard (I suspect they are the cause of that size report discrepancy). In the extreme case of some of my runs, they are json files, with paths starting with artifact/, and they number in the tens of thousands: e.g.:

artifact/142190504/wandb_manifest.json
artifact/142190516/wandb_manifest.json
artifact/142190533/wandb_manifest.json
artifact/142190533/wandb_manifest.json.deadlist
artifact/142190544/wandb_manifest.json
artifact/142190544/wandb_manifest.json.deadlist

These files obviously relate to artifacts, but the artifacts themselves are queried through Run. logged_artifacts(). I have been deleting artifacts in my past runs, keeping only less than 400 per run (these are very small artifacts), and I have been expecting the count of these “manifest” files to go down, But it instead seemed to have created additional .deadlist files…

My questions are:

  1. What are these files?
  2. Are they important?
  3. What can I expect to happen if I manually delete them through the export API?
  4. Are they counted towards the 100GB limit?

Hello!

Firstly, could you describe the significant discrepancy between, preferably with screenshots? And to confirm some things so that I can reproduce this myself:

  1. You are deleting the artifacts via run.logged_artifacts() then using .delete() these artifacts
  2. These artifacts are of old runs and so all runs are finished when deleting these artifacts.

As for the wandb_manifest.json, this is our file that we use to store data about the artifact. For example, if the same file is uploaded twice under different artifacts, then we will not have you upload it twice and have it reference the other file. This is done via the wandb_manifest.json. As for the .deadlist that is showing up, that should be a temporary file that is generated while the manifests are being deleted.

As for the affect of deleting this file, it is important to keep these since we rely on this manifest when using Weave functions and when referencing the Artifact in other parts of the code. Since these files are generated per artifact, they are not normally a significant amount of data that is stored. However, if you have many runs that each create its own artifact, this manifest will be created within each one. I will have to check if the manifest gets included in this calculation.

Thanks for your attention to this issue!

Here are the screenshots that show the discrepancy. As you can see, The overall run storage size is listed as 9.4GB, but when clicking into the run and seeing the breakdown, the overall size becomes 3.3GB. Note that those artifacts/ files are not visible in the breakdown.

Regarding what you wanted to confirm:

  1. Yes
  2. Yes

Thanks for the insight in the role of the files. If the .deadlist files are meant to be temporary, can I expect them to disappear on their own over time? Note that in the run I show in the screenshots, there are almost as many .deadlist files as there are wandb_manifest.json, at the count of around 30,000 each (working from memory for the count).

Hello! After looking around, there are two reasons that this could be happening:

  1. If you are deleting many artifacts all at once, we could still be processing the deletion and the storage will be update after some time
  2. There is a bug when deleting the last known artifact programmatically, the size of the artifact still remains as the last known file despite being deleted

I have also been informed that anything uploaded as part of the Artifact will be counted in the storage however, deleting the manifest will likely render the artifact unusable.

Could you send me a link to one of your workspaces/projects so I can dive into it a little more?

Sure!
Please check out the following run as an example (I’m including both the link to the run workspace and the run as shown in Usage):

Hello! After a bit of time it looks like that your Usage has decreased, which looks like it could have maybe been us processing the deletion. As for the storage, I would say to look at the larger box which contains the storage as the true value of how much storage is in wandb (as shown with my storage below).
image

Hi, I originally had some doubts on your statement since the size of my pruned runs were still bigger than I’d hoped, but after accounting for the various files remaining inside, it seems that the storage add up correctly now, and the contribution of the .deadlist files, if any, are negligible.
Guess I’ll just have to prune my runs even harsher then.
Thank you for your clarification!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.