Strategy for adding referenced files to an artifact

kevinashaw · September 11, 2022, 5:29am

We are new to WandB and working out best-practices for using referenced-artifacts.
We have an S3 bucket were we keep our data corpus, so that it can be shared between machines.
We want to use WandB to track these files and to use it to download/synchronize copies of the datasets to local machines.
There seem to be two ways to add files to an artifact:

By-Group: Create the file locally and put it to S3. Repeat with all other data files until done. Then use the artifact.add_reference() command and point it to the S3 prefix/directory for the files. This will add the directory and its files to the artifact. The artifact will report that only a single “file” exists (since we only added the directory) – which I think is weird, by the way – but all the files seem to be there.
One-by-one: create the file locally, put it to S3 and immediately add the S3 file to the artifact. Repeat until all files are done and then close the artifact and the run. The artifact will now properly note that n-files have been added.

The real question is, when I later execute a download(root=my-local-path) operation, will I be able to cleanly load the files from the artifact to my local directory. That is, without having to fight a path mismatch between the S3 paths and my local paths.
That is, if the S3 path is: /really/deep/s3/path/to/my/dataset/files
And my local path is: /Users/user/
Can the files end up here: /Users/user/dataset/files

Thank you,
Kevin

mohammadbakir · September 14, 2022, 6:15pm

Hi @kevinashaw , when downloading the Artifact, set the argument recursive= Trueand define your root directory to download Artifacts to. The artifacts will download cleanly.

mohammadbakir · September 20, 2022, 6:17pm

Hi @kevinashaw, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

kevinashaw · September 20, 2022, 6:40pm

@mohammadbakir Thank you for the response. However my core question still stands. What sets the base path or root path for a given file. If I add the files one by one how does it know the root path of the file?
For example:
If the S3 path for a file is: /really/deep/s3/path/to/my/dataset/files/my_file
And my local path is: /Users/user/
Do the files end up here: /Users/user/dataset/files/my_file
Or do I get: /Users/user/really/deep/s3/path/to/my/dataset/files/my_file

How do I control this?

kevinashaw · October 5, 2022, 4:28am

@mohammadbakir Do you have any thoughts on this? Thank you.

system · December 4, 2022, 4:28am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Add_reference() with nested folders W&B Help artifacts	4	952	November 18, 2022
Invalid artifact W&B Help artifacts , wandb	3	664	January 1, 2022
Listing files of refence artifacts with temporary mounted folder (Azure) W&B Help artifacts , wandb	5	546	January 24, 2022
Access local filesystem artifacts without downloading W&B Help artifacts	3	928	May 22, 2023
Uploading to Reference Artifacts W&B Help artifacts	0	12	December 8, 2024

Strategy for adding referenced files to an artifact

Related topics