Version tracking for artifacts added by reference (files on local system)

I have a large dataset with multiple files on my local file system that I would like to track. It is not part of a github repository, since the files are quite large (it is around 30 GB, each file being 0.5 GB).

I added references to these files in W&B using the command

artifact.add_reference(name='data_folder',uri='file://path/to/directory')

Now, if I change these files and log them in a run, I can see that the version of the artifact changes on the web UI. In the future, if I want to use an older version of this dataset, is there a way to do so?

I’m assuming not because W&B is only tracking the references, so there is no way of going back to the old dataset.

Any help would be appreciated.

Thanks,
Chaitanya

Hi @chaitanya-kolluru, thanks for your question! When using reference artifacts, we only keep track of the metadata associated with the files and not the files themselves so, if your bucket has object versioning enabled, we will retrieve the object version corresponding to the state of the file at the time the artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training. Please let me know if this is helpful and don’t hesitate to ask any other questions you may have!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.