Strategy for adding referenced files to an artifact

We are new to WandB and working out best-practices for using referenced-artifacts.
We have an S3 bucket were we keep our data corpus, so that it can be shared between machines.
We want to use WandB to track these files and to use it to download/synchronize copies of the datasets to local machines.
There seem to be two ways to add files to an artifact:

  1. By-Group: Create the file locally and put it to S3. Repeat with all other data files until done. Then use the artifact.add_reference() command and point it to the S3 prefix/directory for the files. This will add the directory and its files to the artifact. The artifact will report that only a single “file” exists (since we only added the directory) – which I think is weird, by the way – but all the files seem to be there.
  2. One-by-one: create the file locally, put it to S3 and immediately add the S3 file to the artifact. Repeat until all files are done and then close the artifact and the run. The artifact will now properly note that n-files have been added.

The real question is, when I later execute a download(root=my-local-path) operation, will I be able to cleanly load the files from the artifact to my local directory. That is, without having to fight a path mismatch between the S3 paths and my local paths.
That is, if the S3 path is: /really/deep/s3/path/to/my/dataset/files
And my local path is: /Users/user/
Can the files end up here: /Users/user/dataset/files

Thank you,

Hi @kevinashaw , when downloading the Artifact, set the argument recursive= Trueand define your root directory to download Artifacts to. The artifacts will download cleanly.

Hi @kevinashaw, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

@mohammadbakir Thank you for the response. However my core question still stands. What sets the base path or root path for a given file. If I add the files one by one how does it know the root path of the file?
For example:
If the S3 path for a file is: /really/deep/s3/path/to/my/dataset/files/my_file
And my local path is: /Users/user/
Do the files end up here: /Users/user/dataset/files/my_file
Or do I get: /Users/user/really/deep/s3/path/to/my/dataset/files/my_file

How do I control this?

@mohammadbakir Do you have any thoughts on this? Thank you.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.