Listing files of refence artifacts with temporary mounted folder (Azure)

desertgator · November 25, 2021, 9:53am

I am trying to make wandb work with Azure for versioning my datasets.
My dataset is too big for any upload, so I am keeping it in Azure and add it by reference.
I am using the file based reference (file:///) for a folder that is mounted to the compute instance.
Registering the dataset, checksumming it etc all works fine.

My problem is now how I USE the artifact.

Since the folder is mounted by azure using a randomly generated name each time I cannot use the stored reference name. What I am doing right now is using the keys of the manifest entries:
artifact.manifest.entries.keys()
This gives me all the filenames and I manually concat it to the mounted folder pathname.

Is there a better, less hacky, way of doing it? (Or even a better way to use Azure, since wandb supports s3 and gc?)
.download() is no option since the dataset is to big and the mounted folder is fine. .checkout() does not work, since the folder also contains other files which I do not want to delete. .get() and similar also dont work since I dont know the file paths.

In my ideal world I would just have a function artifact.files(root="mount_path", verify=True) which returns a list of all filenames and verifies they are correct via checksum. So I can just use the dataset and be sure it is the same one.

Thank you! Artifacts are such a great addition to wandb and I would love to use them

lesliewandb · November 29, 2021, 9:35pm

Hi DesertGator,

It looks like add_reference (https://docs.wandb.ai/ref/python/public-api/artifact#add_reference) is what you’re looking for. I’m going to double check with our engineering team to make sure that we have this feature for azure though.

Warmly,
Leslie

lesliewandb · November 30, 2021, 1:08pm

I double checked and as long as path to the file is accessible over https you should be able to use add_reference.

Warmly,
Leslie

lesliewandb · December 3, 2021, 2:27pm

Was this able to help you, or are you still experiencing this problem?

lesliewandb · December 6, 2021, 4:45pm

Hi DesertGator, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · January 24, 2022, 9:54am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Azure Artifact Referencing W&B Help	4	312	July 8, 2022
Strategy for adding referenced files to an artifact W&B Help artifacts	5	845	December 4, 2022
Uploading to Reference Artifacts W&B Help artifacts	0	7	December 8, 2024
Invalid artifact W&B Help artifacts , wandb	3	652	January 1, 2022
External artifacts(added with add_reference) consume wandb storage W&B Help artifacts	7	122	July 10, 2024

Listing files of refence artifacts with temporary mounted folder (Azure)

Related topics