Access local filesystem artifacts without downloading

I would like to use Artifacts to log and track the usage of my datasets. These datasets live on a local filesystem. I was able to create a reference artifact, but the problem I encounter is that the only way to access the original local filepath is to call artifact.download() or artifact.get_path(name).ref.

Calling download() doesn’t work for me because the files are aleady local and very large. I definitely do not want to make a copy.

On the other hand, artifact.get_path(name).ref works, but this entails already knowing the path of the file since that is what is used for name as far as I can tell. I suppose even if I could set a custom name for each file in the directory (can you?), I’m not sure one can retrieve those names from the artifact itself and therefore they would need to be known by anyone using the artifact. Ideally one would only need the artifact’s name and from there you can see the local file paths for all of the files in that artifact.

In case it’s helpful, I add these files to the artifact by doing:

artifact.add_reference(name='data_folder',uri='file://path/to/directory')

When I use the artifact, I can do

files = artifact.files(),

which returns an iterable of all of the files, but these File objects do not have a way to get the path/uri either.

Is there anyway to do this?

Thanks and let me know if you have any questions that will help you understand or solve this.

Ope… I just figured it out… seems like you can introspectively obtain the refs by looking through the manifest.

Hi @shababo-sci thanks for writing in, and glad to hear you figure this out! I will also post a code snippet here for any future reference.

api = wandb.Api()
artifact = api.artifact('entity/project/artifact-name:alias', type='artifact-type')

# First option
for k,v in artifact.manifest.entries.items():
  print(v.ref)

# Second option
for f in artifact.files():
  print(f.url)

I hope this helps, feel free to ask us any further questions!

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.