Logging and using artifacts in one run

Hello everyone
I am relatively new to using wandb and I’m encountering some issues with logging and then immediately using artifacts within the same run. My goal is to log artifacts and then use them right away, unless there’s a reason I should avoid doing this.

Here’s what I do:

  1. Initialize the run using setup_wandb.
  2. Log the data using load_data() function and immediately download it again in train_classifier(). This part works, but the data only appears as an output and not as an input under artifacts in wandb.
  3. Log a model I also try to log the model after training and intend to download it for testing. However, this seems not to work, although the model appears under files on wandb when the run is finished and the status after uploading is COMMITTED. What am I missing?

Here are the relevant code snippets:

def setup_wandb(self, sweep_id, is_sweep):
      run = wandb.init(entity=WANDB_ENTITY,
                         project=PROJECT_NAME,
                         id=run_id,
                         config=json.load(file),
                         notes=f'{self.classifier_name} {self.dataset_name} {self.phoneme_recognizer_name} {self.representation}-grams idf:{self.use_idf}',
                         dir=run_folder,
                         resume="allow",
                         job_type="sweep" if is_sweep else "run")

        print(f"Run initialized with ID: {run_id} in dir: {run_folder}")
        run.name = f"{run_id}"

 def log_data(self):
        paths = {
            "train": self.get_split_path('train'),
            "valid": self.get_split_path('valid'),
            "test": self.get_split_path('test')
        }

        data_artifact = wandb.Artifact(
            f'{self.phoneme_recognizer_name}-{self.dataset_name}-dataset', # Artifact's name
            type="dataset", 
            description=f"Preprocessed dataset for {self.phoneme_recognizer_name} {self.dataset_name}, split into train/valid/test",
            metadata={"sizes": {name: os.path.getsize(path) for name, path in paths.items()}}
        )

        for name, path in paths.items():
            data_artifact.add_file(path, name=f"{name}.csv")

        data_artifact_result = wandb.run.log_artifact(data_artifact).wait()
        print(f"data_artifact status: {data_artifact_result.state}")

def train_classifier(self):
        artifact_data = wandb.run.use_artifact(f'{self.phoneme_recognizer_name}-{self.dataset_name}-dataset:latest')

        artifact_data_dir = artifact_data.download()
        print('artifact_data_dir:', artifact_data_dir)

        train_path = os.path.join(artifact_data_dir, 'train.csv')
        valid_path = os.path.join(artifact_data_dir, 'valid.csv')

        df_train = pd.read_csv(train_path)
        df_valid = pd.read_csv(valid_path)

        # Logging Model to wandb
        model_artifact = wandb.Artifact(
            'trained_model', # Artifact's name
            type='model', # Artifact's type
            description=f'Trained {self.classifier_name} model on {self.phoneme_recognizer_name} transcriptions and {self.dataset_name} dataset'
        )

        with model_artifact.new_file('trained_pipeline.pkl', mode='wb') as file:
            dill.dump(pipeline, file)

        model_artifact_result = wandb.run.log_artifact(model_artifact).wait()
        print(f"model_artifact status: {model_artifact_result.state}")

def test(self):
        data_artifact = wandb.run.use_artifact(f'{self.phoneme_recognizer_name}-{self.dataset_name}-dataset:latest')

        data_artifact_dir = data_artifact.download()
        test_path = os.path.join(data_artifact_dir, 'test.csv')
        df_test = pd.read_csv(test_path)

        model_artifact = wandb.run.use_artifact(f'{WANDB_ENTITY}/{PROJECT_NAME}/trained-model:latest', type='model')
        model_artifact_dir = model_artifact.download()

        model_file_path = os.path.join(model_artifact_dir, 'trained_pipeline.pkl')
        try:
            pipeline = dill.load(open(model_file_path, 'rb')) 
            print("Model successfully loaded.")
        except Exception as e:
            print(f"Failed to load model: {e}")

Here is the error message I receive:

model_artifact status: COMMITTED
wandb:   3 of 3 files downloaded.  
wandb: ERROR Unable to fetch artifact with name <WANDB_ENTITY>/<PROJECT_NAME>/trained-model:latest
Traceback (most recent call last):
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/apis/normalize.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/apis/public/api.py", line 958, in artifact
    artifact = wandb.Artifact._from_name(
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/sdk/artifacts/artifact.py", line 263, in _from_name
    raise ValueError(
ValueError: Unable to fetch artifact with name <WANDB_ENTITY>/<PROJECT_NAME>/trained-model:latest

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/.../.../<myscript>.py", line 64, in <module>
    classifier.test()
  File "/Users/.../.../<myscript>.py", line 330, in test
    model_artifact = wandb.run.use_artifact(f'{WANDB_ENTITY}/{PROJECT_NAME}/trained-model:latest', type='model')
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 371, in wrapper_fn
    return func(self, *args, **kwargs)
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 361, in wrapper
    return func(self, *args, **kwargs)
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 2859, in use_artifact
    artifact = public_api.artifact(type=type, name=name)
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/apis/normalize.py", line 87, in wrapper
    raise CommError(message, err).with_traceback(sys.exc_info()[2])
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/apis/normalize.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/apis/public/api.py", line 958, in artifact
    artifact = wandb.Artifact._from_name(
  File "/Library/anaconda3/envs/MLOps-Dialects/lib/python3.9/site-packages/wandb/sdk/artifacts/artifact.py", line 263, in _from_name
    raise ValueError(
wandb.errors.CommError: Unable to fetch artifact with name XXX/XXX-Test/trained-model:latest

This a reply from our support bot designed to assist you with your WandB-related queries.
To reach a human please reply to this message.

Logging and using artifacts within the same run in Weights & Biases (W&B) is a common workflow, especially when you want to track datasets, models, or any other files as part of your machine learning experiments. Here are some insights and solutions to the issues you’re facing:

1. Logging Artifacts as Inputs and Outputs

When you log an artifact and then use it within the same run, it should ideally appear as both an input and an output in the W&B UI. The key to achieving this is to ensure that the artifact is logged as an output first and then used (downloaded or referenced) as an input. From your description, it seems you’re doing this correctly. However, if it only appears as an output, this might be a timing or synchronization issue. Ensure that the artifact is fully uploaded and marked as COMMITTED before attempting to use it as an input.

2. Logging and Using Model Artifacts

To log a model artifact and then download it for testing within the same run, follow these steps:

  • Log the model artifact: After training your model, create a model artifact, add your model file to it, and log it using wandb.run.log_artifact(model_artifact).wait(). The .wait() method ensures that the artifact is fully uploaded before proceeding.

  • Use the model artifact: To use the model artifact, reference it by its name and version (e.g., latest). Ensure you’re using the correct artifact name and that the artifact has been committed. If you encounter an error fetching the artifact, it might be due to the artifact not being fully uploaded or a mismatch in the artifact name/version.

3. Error Fetching Artifact

The error you’re encountering (Unable to fetch artifact with name <WANDB_ENTITY>/<PROJECT_NAME>/trained-model:latest) suggests a few potential issues:

  • Artifact Name or Version Mismatch: Double-check the artifact name and version. Ensure that the artifact has been committed and is available in your project.

  • Synchronization Issue: There might be a delay between when the artifact is logged and when it becomes available for use. The .wait() method should mitigate this, but network delays or API issues could still cause timing problems.

  • Access Permissions: Ensure that your run has the appropriate permissions to access the artifact. This is usually not an issue within the same run, but it’s something to consider if you’re working in a team or organization with restricted access controls.

Solutions

  • Ensure Artifact Commitment: Use the .wait() method after logging artifacts to ensure they’re fully uploaded.

  • Verify Artifact Names and Versions: Double-check the names and versions of the artifacts you’re trying to use. Ensure consistency and accuracy.

  • Debugging: Add more logging around your artifact usage to capture the state of the artifact at the time of the error. This can help identify timing or synchronization issues.

  • W&B Support: If the issue persists, consider reaching out to W&B support or the community forums with specific details about your run and the errors you’re encountering.

Code Snippets

Your code snippets seem well-structured for logging and using artifacts within the same run. Ensure that all artifact-related operations (logging and using) are correctly synchronized, and consider adding more logging to capture the state of artifacts at each step of your workflow.

Remember, W&B artifacts are designed to streamline the tracking of datasets, models, and other files in your ML experiments. Ensuring proper logging, commitment, and usage of artifacts can significantly enhance reproducibility and traceability in your projects.

-WandBot :robot:

Thanks, but I’m already using the .wait() method. The naming was wrong.