During the preparation for a training (in prepare_data
in pytorch lightning) I either create or update local data (download, prepare different encodings). I then create a W&B artifact and wait for the upload to be complete. Later in the code (in setup()
in pytorch lightning) I use the data. Strictly speaking, this is not necessary, because I have the files locally, but I want to track the usage of the data (and the IDs of the data used for training, validation, …). I added the wait()
statement, because wandb would download the previous version (v=n-1) of the data /without the enoding just added). In mode ONLINE
this works nicely. However, in mode DISABLED
I get this error: ValueError: Cannot call wait on an artifact before it has been logged or in offline mode
. How am I supposed to handle wait()
in order to have it work in all modes? (it would be nice if wait()
would do it).
This is the sample code:
# Upload the data
artifact = wandb.Artifact(name=..., type=...)
artifact.description = ...
artifact.metadata = ...
artifact.add_file(local_path=...)
wandb.run.log_artifact(artifact)
artifact.save() # I think I don't need this, playing around because of this issue
artifact.wait()
# Use (Download) the data
artifact = wandb.run.use_artifact(artifact_or_name=... + ":latest")
artifact_entry = artifact.get_path(...)
artifact_entry.download(root=...)