Hi –
I’ve created a distributed run and created a single artifact successfully from many parallel processes with the artifact living as a reference on GCP. Very cool!
Now, I would only like to download a subfolder from the artifact:
e.g.
root/
→ hourly/
→ monthly/
I can successfully download the whole artifact:
artifact = run.use_artifact(...)
artifact.download()
I can successfully download a single file from the artifact:
artifact = run.use_artifact(...)
path = artifact.get_path("hourly/path_to_file.hdf5")
location = path.download()
However what I really want to do is download the entire monthly directory, i.e.:
artifact = run.use_artifact(...)
path = artifact.get_path("hourly")
location = path.download()
However this does not seem to work. I’ve tried a variety of syntaxes for entering the folder path with no luck.
Is this possible? Am I missing something in the docs?
A related solution would be to be able to programmatically list all the files within hourly
and then call get_path
and download
on them individually, but I also couldn’t figure out how to do introspection of an artifact before downloading it from within python.
An alternative approach which I would prefer not to do is to split out the hourly
and monthly
subfolders into their own artifacts.
Guidance would be much appreciated!