Error in finding artifact when using sagemaker

I am able to access the artifact via code and terminal. But when using sagemaker estimator, it isn’t able to download it. I get an error message saying that the project does not contain the artifact. I am using the full artifact name just as in the terminal.

The issue is similar to this post here except they are able to get it working in the basic use case it seems (ERROR: Project does not contain artifact - #2 by mohammadbakir). The response lists one point that I am not sure about:

  • Have you verified your sagemaker host environmental variable is referenced correctly prior to executing the training ? wandb status to check “base_url”. To set, use export WANDB_BASE_URL=<HOST>:<PORT>

I’m using the default sagemaker estimator not a custom docker. I also am using the default wandb server.
I do see https://api.wandb.ai as the WANDB_BASE_URL environment variable when I print it out inside the script. I pass it in through the environment parameter of the estimator function (Estimators — sagemaker 2.197.0 documentation).

I’m just wondering if I could be missing something else?

Hey @rahulraj,

Just so I can get a better idea of what’s going on, I have a few questions:

  • Are you able to access the artifact in question through the UI? If so, could you also run the following command to verify that the artifact is accessible:

wandb artifact get <entity>/<project>/<artifact-name>

  • What does your code look like in terms of using this artifact? How are you accessing this in your workflow?

  • If you could give us a reproducible script/pseudocode, I would love to test this out on my end to dig further. And could you also send the full error stack trace?

Thank you!

Yes I can access through UI and terminal.

Simple test script test_wandb.py:

import wandb

run = wandb.init()

artifact_data = run.use_artifact('<entity>/<project>/<artifact-name>')

Sagemaker error stack trace (obfuscating <entity>/<project>):

ds08ejga2i-algo-1-98ks9 | Invoking script with the following command:
ds08ejga2i-algo-1-98ks9 | 
ds08ejga2i-algo-1-98ks9 | /opt/conda/bin/python3.6 test_wandb.py --accelerator  --data_dir aws --dataset **** --gpus -1 --keypoints synth --max_epochs 80
ds08ejga2i-algo-1-98ks9 | 
ds08ejga2i-algo-1-98ks9 | 
ds08ejga2i-algo-1-98ks9 | wandb: Currently logged in as: rahulraj. Use `wandb login --relogin` to force relogin
ds08ejga2i-algo-1-98ks9 | wandb: wandb version 0.15.7 is available!  To upgrade, please run:
ds08ejga2i-algo-1-98ks9 | wandb:  $ pip install wandb --upgrade
ds08ejga2i-algo-1-98ks9 | wandb: Tracking run with wandb version 0.13.7
ds08ejga2i-algo-1-98ks9 | wandb: Run data is saved locally in /opt/ml/code/wandb/run-20230726_200834-pytorch-training-2023-07-26-20-08-08-111-jjfnx0-algo-1-98ks9
ds08ejga2i-algo-1-98ks9 | wandb: Run `wandb offline` to turn off syncing.
ds08ejga2i-algo-1-98ks9 | wandb: Syncing run pytorch-training-2023-07-26-20-08-08-111-jjfnx0-algo-1-98ks9
ds08ejga2i-algo-1-98ks9 | wandb: ⭐️ View project at https://wandb.ai/rahulraj/uncategorized
ds08ejga2i-algo-1-98ks9 | wandb: 🚀 View run at https://wandb.ai/rahulraj/uncategorized/runs/pytorch-training-2023-07-26-20-08-08-111-jjfnx0-algo-1-98ks9
ds08ejga2i-algo-1-98ks9 | wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
ds08ejga2i-algo-1-98ks9 | wandb: ERROR Project ***/*** does not contain artifact: "whole_training_data_unified:v10"
ds08ejga2i-algo-1-98ks9 | Traceback (most recent call last):
ds08ejga2i-algo-1-98ks9 |   File "/opt/conda/lib/python3.6/site-packages/wandb/apis/normalize.py", line 26, in wrapper
ds08ejga2i-algo-1-98ks9 |     return func(*args, **kwargs)
ds08ejga2i-algo-1-98ks9 |   File "/opt/conda/lib/python3.6/site-packages/wandb/apis/public.py", line 941, in artifact
ds08ejga2i-algo-1-98ks9 |     artifact = Artifact(self.client, entity, project, artifact_name)
ds08ejga2i-algo-1-98ks9 |   File "/opt/conda/lib/python3.6/site-packages/wandb/apis/public.py", line 4315, in __init__
ds08ejga2i-algo-1-98ks9 |     self._load()
ds08ejga2i-algo-1-98ks9 |   File "/opt/conda/lib/python3.6/site-packages/wandb/apis/public.py", line 5028, in _load
ds08ejga2i-algo-1-98ks9 |     % (self.entity, self.project, self._artifact_name)
ds08ejga2i-algo-1-98ks9 | ValueError: Project uplift/lightning-3d-pose_DTL does not contain artifact: "whole_training_data_unified:v10"

Adding the API key in the docker seems to resolve the issue as I am able to download the artifact.

So the secrets.env way seems to be cause of the issue. Although it is able log in and create the runs in the w&b account, its not able to download the artifact. Any idea why?

@rahulraj Glad to hear the issue is resolved. My thoughts are this: without a valid API key, access to certain artifacts may be restricted. By providing the API key in the Docker environment, you are enabling the SageMaker Estimator to authenticate and access the artifacts (which you need for .download)

I wouldn’t say the issue has been resolved, as I don’t want to put the API key in the docker file.
I follow the recommended steps of calling wandb.sagemaker_auth(), and the secrets.env file is created correctly.
wandb has no problems creating the run logs as I can see them on the dashboard, so the authentication must be happening.
When it comes to downloading the artifact, however, there is an issue.

Hey @rahulraj, this is a known issue and I’ll be sure to add your information to the bug report. I will update you as progress arises on the issue. What SDK version are you on?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.