I’ve created a distributed run and created a single artifact successfully from many parallel processes with the artifact living as a reference on GCP. Very cool!
Now, I would only like to download a subfolder from the artifact:
I can successfully download the whole artifact:
artifact = run.use_artifact(...) artifact.download()
I can successfully download a single file from the artifact:
artifact = run.use_artifact(...) path = artifact.get_path("hourly/path_to_file.hdf5") location = path.download()
However what I really want to do is download the entire monthly directory, i.e.:
artifact = run.use_artifact(...) path = artifact.get_path("hourly") location = path.download()
However this does not seem to work. I’ve tried a variety of syntaxes for entering the folder path with no luck.
Is this possible? Am I missing something in the docs?
A related solution would be to be able to programmatically list all the files within
hourly and then call
download on them individually, but I also couldn’t figure out how to do introspection of an artifact before downloading it from within python.
An alternative approach which I would prefer not to do is to split out the
monthly subfolders into their own artifacts.
Guidance would be much appreciated!