Get S3 Filepath for WandB Artifact

Hi team,

If I use W&B Artifacts with an S3 Reference, is there any easy way to get the underlying S3 URI?

artifact = run.use_artifact('my_artifact:latest')
s3_path = artifact.<some_method>()

For context, I’m trying to train a HuggingFace model with Sagemaker. When I spin up the job, Sagemaker expects paths to local files or S3 URIs to the dataset. I’m trying to avoid downloading the data to my local machine, by getting the proper S3 URL via W&B Artifacts; since I’m already using Artifacts for data versioning.

Thank you in advance!

Hey Sahil,

We don’t save it automatically but you can save the s3 uri in the artifact metadata object. Here is a link to the documentation for more details.

Regards,
Arman

Hi @armanharutyunyan and @sahilchopra

I would love to have a similar function.

When you go to your Dataset Artifact on the WandB web interface it actually shows under Files all the s3 URLs of all objects within the dataset.

It would be awesome if we could use the same logic as described here: Weights & Biases without the need to actually download the data.

Especially since most of us will eventually train on GCP or AWS, the workflow will almost never including the download of the files into the current instance.
TensorFlow lets you use and query data from S3 or GCP buckets without any problems.

Please look into this

1 Like

I found the s3 path for remotely tracked artifacts buried in artifact.manifest.entries[some file].ref.