I would like to use Artifacts to log and track the usage of my datasets. These datasets live on a local filesystem. I was able to create a reference artifact, but the problem I encounter is that the only way to access the original local filepath is to call artifact.download()
or artifact.get_path(name).ref
.
Calling download()
doesn’t work for me because the files are aleady local and very large. I definitely do not want to make a copy.
On the other hand, artifact.get_path(name).ref
works, but this entails already knowing the path of the file since that is what is used for name
as far as I can tell. I suppose even if I could set a custom name
for each file in the directory (can you?), I’m not sure one can retrieve those names from the artifact itself and therefore they would need to be known by anyone using the artifact. Ideally one would only need the artifact’s name and from there you can see the local file paths for all of the files in that artifact.
In case it’s helpful, I add these files to the artifact by doing:
artifact.add_reference(name='data_folder',uri='file://path/to/directory')
When I use the artifact, I can do
files = artifact.files()
,
which returns an iterable of all of the files, but these File
objects do not have a way to get the path/uri either.
Is there anyway to do this?
Thanks and let me know if you have any questions that will help you understand or solve this.