Set the WANDB_PROJECT environment variable cannot name the project properly

Hi everyone, I am using wandb with Huggingface in a AWS Sagemaker notebook and I am refering to the tutorial here: Hugging Face Transformers | Weights & Biases Documentation.

I tried to set the WANDB_PROJECT environment variable before setting up the huggingface_estimator, which will call train.py.

train.py is where I initialize the Trainer. The above tutorial mentions to make sure to set the project name before initializing the Trainer, and I think I am doing this correctly here.

Here are some useful snippets of my code.

import wandb
wandb.login()

WANDB_PROJECT=my_project_name

...

huggingface_estimator = HuggingFace(
  image_uri=image_uri,
  entry_point='train.py',
  source_dir='./scripts',
  instance_type='ml.g4dn.xlarge',
  instance_count=1,
  role=role,
  py_version='py39',
  hyperparameters=hyperparameters,
)

train.py

    training_args = TrainingArguments(
        output_dir=args.output_dir,
        per_device_train_batch_size=args.per_device_train_batch_size,
        num_train_epochs=args.epochs,
        learning_rate=args.learning_rate,
        save_strategy="epoch",
        logging_strategy='epoch',
        report_to="wandb",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        data_collator=collate_fn,
        tokenizer=image_processor,
    )

    trainer.train()

I would greatly appreciate any guidance or advice on how to resolve this issue. Thank you very much in advance for your help!

Hey @oschan77 ,

You can set the project name in your script like so:

import os
os.environ["WANDB_PROJECT"] = "sentiment-analysis"
1 Like

Hi @oschan77 thanks for writing in! The WANDB_PROJECT needs to be exported as an environment variable as Morgan suggested above. Since you’ve mentioned you’re working in AWS Sagemaker notebook, another alternative would be:
%env WANDB_PROJECT=project_name

Please let us know if these would work for you, and if you have any other questions.

1 Like

I’m having some similar issues… in my case I am using a huggingfaces trainer class, and using a RAY backend. I am getting the (very annoying) case where the project name is set properly on the head node, but I think the wandb instances on the worker nodes are just defaulting to huggingface, making it very confusing. I’ve tried several iterations, but I am still confused. I would like to be able to change the WANDB_PROJECT when I execute the code. I am running things as docker containers, so as a sanity check, I restarted everything, made sure the environment variable was set before I started things, and I got one run to work. But this is brittle, as having to restart all my nodes to change an environment variable is not ideal.

I’ve seen some (confusing) posts regarding wandb ignoring certain environment variables if wandb.init() is run as well. What’s been happening is that I’ll see my hyper parameter sweep start, and the first job gets the proper name, but the 5 other processes (running on other nodes), have the default hugging faces name. I realize using backend=“ray” may add another layer of complexity, but I didn’t think running ray from more than one machine should be considered an edge case. I have basically pieced this together from several of the docs I saw online ( including Using |🤗| Huggingface Transformers with Tune — Ray 2.3.0 )

os.environ[‘WANDB_PROJECT’] = “stainParamSweep”

wandb.login()
wandb.init(project=‘stainParamSweep’)

gpus_per_trial=0.5
best_model = trainer.hyperparameter_search(
hp_space=lambda _: tune_config,
backend=“ray”,
resources_per_trial={“cpu”: 8, “gpu”: gpus_per_trial},
checkpoint_score_attr=“training_iteration”,
local_dir=“/data/ray_results_tuning/”,
name=“tune_transformer_pbt”,
log_to_file=True,
n_trials=100,
progress_reporter= reporter)

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.