Artifacts (local) caching - how does it really work?

Hi all,

I’m trying to figure out how does the caching of artifacts work. Let’s say I want to download a model artifact to run some evaluation on. I don’t need the file on disk to persist rather I just want to load it into memory. What I do right now in my evaluation script is:

import tempfile
import wandb

artifact = wandb.use_artifact(model_weights_uri)
with tempfile.TemporaryDirectory() as tmpdirname:
    artifact.download(tmpdirname)
    model_weights = load_pickle(os.path.join(tmpdirname, "model_weights.pickle"))

And from that point on I use the model_weights as it was loaded into memory.

My first question is: if I run the code twice (on the same machine), will the model-weights be downloaded again or are they cached somewhere? assuming the logged artifact wasn’t changed of course. And if they are cached, where are they cached?
I’m also not clear about the artifact directory (which is used if I run artifact.download() without any argument). Does that directory serve as cache? if so, what does the .cache directory used for?

I would appreciate answers to my questions and perhaps a general explanation of the artifact caching mechanism & best practices.

Thanks!
Ran

Hi @ranshadmi-nexite,

Thank you for your question. You are right, all Artifacts are cached on your system under ~/.cache/wandb/artifacts and organized by their checksum. So if you try to download a file with checksum x and that file has been logged in an Artifact from your machine or downloaded to your machine as part of an artifact before, we just pull it from the cache by checking if there is a cached Artifact file with checksum x.

So, if you run the same code twice, assuming the version of the artifact you are trying to download has not changed, the artifact can simply be pickked up from your cache directory.

Also, when calling artifact.download() without any arguments, the artifact is saved in the directory in which the code is running. This, however, is not the directory that serves as a cache, that still remains .cache which acts as a central location to look for artifacts before fetching it.

Thanks,
Ramit

1 Like

Hi @ranshadmi-nexite,

Thank you for your question. You are right, all Artifacts are cached on your system under ~/.cache/wandb/artifacts and organized by their checksum. So if you try to download a file with checksum x and that file has been logged in an Artifact from your machine or downloaded to your machine as part of an artifact before, we just pull it from the cache by checking if there is a cached Artifact file with checksum x.

So, if you run the same code twice, assuming the version of the artifact you are trying to download has not changed, the artifact can simply be pickked up from your cache directory.

Also, when calling artifact.download() without any arguments, the artifact is saved in the directory in which the code is running. This, however, is not the directory that serves as a cache, that still remains .cache which acts as a central location to look for artifacts before fetching it.

Thanks,
Ramit

[Discourse post]