External artifacts(added with add_reference) consume wandb storage

I create an artifact with the following code

import wandb
run = wandb.init(project="grachev", name=f"dump")
with run:
    artifact = wandb.Artifact("grachev_dataset", type="dataset")
    artifact.add_reference("s3://alblml/kaggle/preprocessing/240619_175925_SSUW/")
    run.log_artifact(artifact)

After that, I observe that it consumes wandb storage space.

As I understand, wandb must store only metainfo, and storage should not be consumed.

Hi @aleksei-grachev-tech Good day and thank you for reaching out to us! Happy to help you on this.

Let me check this behavior and verify the correct expectation when using artifact references. For the meantime, could you clarify what type of artifact was referenced here? Are these media files? If yes, does the 67 GB that we are seeing in the UI also correspond to the file being referenced here? This information might help us with our review. Thank you!

For the meantime, could you clarify what type of artifact was referenced here? Are these media files?
It’s directory with parquet files.

If yes, does the 67 GB that we are seeing in the UI also correspond to the file being referenced here?
It looks strange in UI. 67 GB corresponds to run^ but there is no specific file/directory in UI that consumes 67 GB.

@paulo-sabile is there any update?

Hi @aleksei-grachev-tech Thank you for the follow up. Please allow us more time to investigate this particular artifact and I will get back to you for another update. We are still investigating this and will keep you posted. Appreciate your patience on this! Thank you.

Hi @aleksei-grachev-tech Good day! Just letting you know that I am working together with my team to further investigate this. Just a quick request, could you please provide us the project link/URL for the impacted project? Thank you!

I can’t reproduce the problem anymore. I will test it for some time

Thank you @aleksei-grachev-tech If you were able to experience the same issue again on any of your projects, please share us which project is impacted so we can take a look at it and review. Thank you!