we are using W&B to manage our artifact versions but their actual location is on a Google Bucket.
I have noticed that using the
wandb python library, ie:
api = wandb.Api()
artifact = api.artifact("name")
it downloads between 500-600 Mbit/s.
gcloud storage cp --recursive gs://<bucket>/<artifact_folder>/* /path
it downloads 4-5 Gbit/s, max throughput of the disk on the VM.
Any suggestion on how to speed it up?
Since you are storing your files in a Google bucket, are you using Reference Artifacts? This should change download to be from a bucket itself instead from our servers. The bucket download should be quite a bit faster than doing it through our servers.
Thanks for your reply.
Yes, we are already using reference artifacts.
After looking around, the difference may be due to the recursive download that you are using for GCP. We are using defaults for the artifact download so it looks like the difference is purely based on the API calls but I am investigating further into the differences. However, as of right now, there isn’t a faster way to download artifacts since we download reference artifacts with only one method.
Thanks for looking into this.
As you can imagine it is a bit annoying to not be able to take advantage of the full bandwidth.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.