Force local cache usage

Hi everyone,

I know artifacts are getting cached locally with a checksum so their retrieval on subsequent .download() calls is faster. Does this only work for whole artifacts for individual files in the artifact as well?
I have code like artifact.get_entry("reference_images.pt").download() which i need to call often for evaluation. Unfortunately i have notices that i often get timeouts from the wandb client library, which i assume is the rate limiter.
Is there a way to circumvent the server calls and force wandb to load an artifact from the local cache or even just make sure the local cache works as intended?

Thank you for reaching out to W&B Technical Support!

W&B does cache artifacts locally to speed up the retrieval process. This caching applies to both whole artifacts and individual files within an artifact. When you call .download() on an artifact or use .get_entry().download() for a specific file, W&B will first check the local cache to see if the file is available before attempting to download it from the server.

If you are experiencing timeouts, it could be due to rate limiting or network issues. To ensure that W&B uses the local cache and avoids unnecessary server calls, you can clean up the local artifact cache to remove files that have not been used recently, which can help with performance. You can run the following command in your terminal to prune the cache:

$ wandb artifact cache cleanup

Additionally, you can limit the size of the cache to a certain threshold, such as 1GB, to manage local storage:

$ wandb artifact cache cleanup 1GB

For your specific use case, if you want to force W&B to load an artifact from the local cache, you should ensure that the artifact has been downloaded at least once so that it’s available in the cache. After the initial download, subsequent calls to .download() or .get_entry().download() should use the cached version, unless the cache has been cleared or the file has been updated on the server.

Here’s an example of how you might use the artifact caching:

import wandb

# Initialize a W&B run
run = wandb.init()

# Use an artifact (this will download it if it's not already cached)
artifact = run.use_artifact("my_dataset:latest")

# Get a specific entry from the artifact
entry = artifact.get_entry("reference_images.pt")

# Download the entry (this will use the local cache if available)
local_path = entry.download()

# Use the file from the local path for evaluation
# ...

# Finish the run
run.finish()

In this example, the entry.download() method will check the local cache first. If the file is found in the cache and has not been updated on the server, it will use the cached version, avoiding the need to re-download the file.

Sources: