Listing files of refence artifacts with temporary mounted folder (Azure)

I am trying to make wandb work with Azure for versioning my datasets.
My dataset is too big for any upload, so I am keeping it in Azure and add it by reference.
I am using the file based reference (file:///) for a folder that is mounted to the compute instance.
Registering the dataset, checksumming it etc all works fine.

My problem is now how I USE the artifact.

Since the folder is mounted by azure using a randomly generated name each time I cannot use the stored reference name. What I am doing right now is using the keys of the manifest entries:
artifact.manifest.entries.keys()
This gives me all the filenames and I manually concat it to the mounted folder pathname.

Is there a better, less hacky, way of doing it? (Or even a better way to use Azure, since wandb supports s3 and gc?)
.download() is no option since the dataset is to big and the mounted folder is fine. .checkout() does not work, since the folder also contains other files which I do not want to delete. .get() and similar also dont work since I dont know the file paths.

In my ideal world I would just have a function artifact.files(root="mount_path", verify=True) which returns a list of all filenames and verifies they are correct via checksum. So I can just use the dataset and be sure it is the same one.

Thank you! Artifacts are such a great addition to wandb and I would love to use them :slight_smile:

Hi DesertGator,

It looks like add_reference (https://docs.wandb.ai/ref/python/public-api/artifact#add_reference) is what you’re looking for. I’m going to double check with our engineering team to make sure that we have this feature for azure though.

Warmly,
Leslie

I double checked and as long as path to the file is accessible over https you should be able to use add_reference.

Warmly,
Leslie

Was this able to help you, or are you still experiencing this problem?

Hi DesertGator, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.