How to download a specific *folder* from an artifact?

Hi –

I’ve created a distributed run and created a single artifact successfully from many parallel processes with the artifact living as a reference on GCP. Very cool!

Now, I would only like to download a subfolder from the artifact:

e.g.

root/
→ hourly/
→ monthly/

I can successfully download the whole artifact:

artifact = run.use_artifact(...)
artifact.download()

I can successfully download a single file from the artifact:

artifact = run.use_artifact(...)
path = artifact.get_path("hourly/path_to_file.hdf5")
location = path.download()

However what I really want to do is download the entire monthly directory, i.e.:

artifact = run.use_artifact(...)
path = artifact.get_path("hourly")
location = path.download()

However this does not seem to work. I’ve tried a variety of syntaxes for entering the folder path with no luck.

Is this possible? Am I missing something in the docs?

A related solution would be to be able to programmatically list all the files within hourly and then call get_path and download on them individually, but I also couldn’t figure out how to do introspection of an artifact before downloading it from within python.

An alternative approach which I would prefer not to do is to split out the hourly and monthly subfolders into their own artifacts.

Guidance would be much appreciated!

Hi @szvsw, thanks for writing in! You can access files under that artifact and filter like:

for file in artifact.files():
    if file.name.startswith("hourly/"):
        print(file)
        file.download()

Please let me know if that’s helpful!

gotcha! Easy enough. It might be nice to add a method that moves that logic into the WandB library to keep top level code a little cleaner, since I imagine this is a somewhat common thing to do. Anyways next time I suppose I may just make separate artifacts… in any case this works, thanks!

Thanks for sharing this feedback @szvsw! I’ll create a feature request for this

Thanks for solution @luis_bergua! I agree with @szvsw that it would be nice to see this in the API.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.