Version tracking for artifacts added by reference (files on local system)

chaitanya-kolluru · September 30, 2023, 4:43pm

I have a large dataset with multiple files on my local file system that I would like to track. It is not part of a github repository, since the files are quite large (it is around 30 GB, each file being 0.5 GB).

I added references to these files in W&B using the command

artifact.add_reference(name='data_folder',uri='file://path/to/directory')

Now, if I change these files and log them in a run, I can see that the version of the artifact changes on the web UI. In the future, if I want to use an older version of this dataset, is there a way to do so?

I’m assuming not because W&B is only tracking the references, so there is no way of going back to the old dataset.

Any help would be appreciated.

Thanks,
Chaitanya

luis_bergua · October 2, 2023, 10:13am

Hi @chaitanya-kolluru, thanks for your question! When using reference artifacts, we only keep track of the metadata associated with the files and not the files themselves so, if your bucket has object versioning enabled, we will retrieve the object version corresponding to the state of the file at the time the artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training. Please let me know if this is helpful and don’t hesitate to ask any other questions you may have!

system · December 1, 2023, 10:14am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Continuing an artifact W&B Help artifacts	5	1741	December 4, 2022
Strategy for adding referenced files to an artifact W&B Help artifacts	5	840	December 4, 2022
Access local filesystem artifacts without downloading W&B Help artifacts	3	903	May 22, 2023
Best Practices for WandB Artifacts W&B Help artifacts	4	759	February 10, 2023
External artifacts(added with add_reference) consume wandb storage W&B Help artifacts	7	122	July 10, 2024

Version tracking for artifacts added by reference (files on local system)

Related topics