Add_reference() with nested folders

We need to set a dataset folder in S3 as an artifact. The folder has many sub-directories (only one layer though).
When I use the a add_reference() command it only stores the directory names of the top-level.
Of course, I could loop across it, but I’m wondering if there is a command option to make the operation recursive?

run  = wandb.init(project=WB_PROJECT)
art = wandb.Artifact(WB_ENTITY, type=WB_DATASET)
art.add_reference(s3_full, max_objects=WB_MAX_OBJECTS_TO_UPLOAD)
run.log_artifact(art)
wandb.finish()

EDIT 1: I conclude that the all files are not being added because the Num Files in the Artifact Overview shows only 5. If I click on the directories, it seems I can see the files, but I assume they are not actually there because of the 5 being reported for the number of files.

Hi Kevin,

Thanks for your question! I think that the only thing that may work for you would be adding an S3 prefix without an explicit name (documentation here), the other way could be using a loop. Please let me know if this would be helpful

Best,
Luis

@system (luis) Thank you for the reply. I think the core question is: why does “Num Files” (In the Artifact view) incorrectly list only the top-level files? I may have 200K files in the artifact, but the “Num Files” only says “5”. See example below. Perhaps this could be fixed? It is somewhat distressing for this parameter to be so wrong.

Hi Kevin,

Thanks for the detailed explanation! I see your issue, I will create a request for this feature, thanks for reporting it! May I help you with any other issue?

Best,
Luis

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.