Download artifact from Google Bucket

Hi,
we are using W&B to manage our artifact versions but their actual location is on a Google Bucket.

I have noticed that using the wandb python library, ie:

import wandb

api = wandb.Api()
artifact = api.artifact("name")
artifact.download()

it downloads between 500-600 Mbit/s.

Instead using:

gcloud storage cp --recursive gs://<bucket>/<artifact_folder>/* /path

it downloads 4-5 Gbit/s, max throughput of the disk on the VM.

Any suggestion on how to speed it up?

Thanks

Hello @power46!

Since you are storing your files in a Google bucket, are you using Reference Artifacts? This should change download to be from a bucket itself instead from our servers. The bucket download should be quite a bit faster than doing it through our servers.

Thanks for your reply.
Yes, we are already using reference artifacts.

After looking around, the difference may be due to the recursive download that you are using for GCP. We are using defaults for the artifact download so it looks like the difference is purely based on the API calls but I am investigating further into the differences. However, as of right now, there isn’t a faster way to download artifacts since we download reference artifacts with only one method.

Thanks for looking into this.
As you can imagine it is a bit annoying to not be able to take advantage of the full bandwidth.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.