I’m running the following code to initialize a run and use my dataset as an artifact:
run = wandb.init(name=job_name, project=wandb_project_name, config=vars(args), save_code=True, job_type="training")
wandb.run.log_code(".")
print(wandb_dataset_name)
dataset = run.use_artifact(wandb_dataset_name)
This code is in a Sagemaker script and when I run, everything works as expected. However, when I run the same exact script in a Sagemaker hyper parameter tuning job instead of a single training job, I get the following error:
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
distributedspectrum/RadioML-Experimentation/RadioML_tfrecords:v0
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: ERROR Project distributedspectrum/RadioML-Experimentation does not contain artifact: “RadioML_tfrecords:v0”
Traceback (most recent call last):
File “/usr/local/lib/python3.8/site-packages/wandb/apis/normalize.py”, line 26, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 937, in artifact
artifact = Artifact(self.client, entity, project, artifact_name)
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 4151, in init
self._load()
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 4735, in _load
raise ValueError(
ValueError: Project distributedspectrum/RadioML-Experimentation does not contain artifact: “RadioML_tfrecords:v0”
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “radioml-training.py”, line 306, in
main(args)
File “radioml-training.py”, line 175, in main
dataset = run.use_artifact(wandb_database_name)
File “/usr/local/lib/python3.8/site-packages/wandb/sdk/wandb_run.py”, line 255, in wrapper
return func(self, *args, **kwargs)
File “/usr/local/lib/python3.8/site-packages/wandb/sdk/wandb_run.py”, line 2575, in use_artifact
artifact = public_api.artifact(type=type, name=name)
File “/usr/local/lib/python3.8/site-packages/wandb/apis/normalize.py”, line 62, in wrapper
raise CommError(message, err).with_traceback(sys.exc_info()[2])
File “/usr/local/lib/python3.8/site-packages/wandb/apis/normalize.py”, line 26, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 937, in artifact
artifact = Artifact(self.client, entity, project, artifact_name)
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 4151, in init
self._load()
File “/usr/local/lib/python3.8/site-packages/wandb/apis/public.py”, line 4735, in _load
raise ValueError(
wandb.errors.CommError: Project distributedspectrum/RadioML-Experimentation does not contain artifact: “RadioML_tfrecords:v0”
Literally everything is exactly the same but I suddenly get this error. I’m also not sure why the warnings about wandb.login() appear as well. They don’t appear in the single training job and I don’t ever call wandb.login() in my code.