Visualizing PBT in dashboard

josiahgottfried · April 30, 2024, 9:24pm

I’ve been fine-tuning a Hugging Face model with Ray Tune population based training, and I’d like to visualize my results in W&B. However, it seems that W&B doesn’t show every version of each model. I’d like to track how the models develop over time. Is there a way to do this?
Here is a screenshot of what my dashboard looks like after runs are complete (2-5 epochs, depending on the model).

The seventh model has data for training steps 0 and 1, but there should be a lot more training steps. All the other training models have data for only step 0. The configuration variables should also vary across steps, but they don’t.
Here are the options I used for the hyperparameter search. I don’t have a repro yet, but I can work on creating one.

training_args = Seq2SeqTrainingArguments(
    output_dir=".",
    report_to="wandb",
    learning_rate=1e-3,  # config
    warmup_steps=0, # config
    weight_decay=0.1,  # config
    num_train_epochs=2,  # config
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    save_total_limit=5,
    max_steps=-1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=4,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False}, # try
    predict_with_generate=True,
    generation_max_length=45,
    generation_config=GenerationConfig(min_length=16, max_length=45),
    use_cpu=(gpus_per_trial <= 0),
    fp16=(True if gpus_per_trial>=1 else False),
    logging_strategy="epoch",
    logging_dir="./logs",
)

trainer = Seq2SeqTrainer(
    model_init=get_model,
    args=training_args,
    train_dataset=tokenized_tweets["train"],
    eval_dataset=tokenized_tweets["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

tune_config = {
    "num_train_epochs": tune.choice([2, 3, 4, 5]),
    "max_steps": 1 if smoke_test else -1,  # Used for smoke test.
}

scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="eval_rouge1",
    mode="max",
    perturbation_interval=1,
    hyperparam_mutations={
        "weight_decay": tune.uniform(0.0, 0.1),
        "learning_rate": tune.loguniform(1e-4, 1e-1),
        "warmup_steps": tune.randint(0, 4),
    },
)

reporter = CLIReporter(
    parameter_columns={
        "weight_decay": "w_decay",
        "learning_rate": "lr",
        "warmup_steps": "warmup",
        "num_train_epochs": "num_epochs",
    },
    metric_columns=["eval_loss", "eval_rouge1", "eval_rouge2", "eval_rougeL", 
                    "eval_gen_len", "eval_runtime", "epoch", "training_iteration"],
)

best_result = trainer.hyperparameter_search(
    hp_space=lambda _: tune_config,
    backend="ray",
    n_trials=num_samples,
    resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
    scheduler=scheduler,
    checkpoint_score_attr="training_iteration",
    stop={"training_iteration": 1} if smoke_test else None,
    progress_reporter=reporter,
    storage_path="/kaggle/working/ray_results/",
    name="tune_transformer_pbt",
    log_to_file=True,
    callbacks=[WandbLoggerCallback(
        project="t5_pbt_0", 
        group="t5_pbt_0",
        log_config=True, 
        upload_checkpoints=True,
    )],
    checkpoint_config=train.CheckpointConfig(num_to_keep=4),
)

luis_bergua · May 6, 2024, 9:47am

Hey @josiahgottfried, thanks for flagging this! It would be really helpful if you could provide the repro code when you have it ready so I can test on my end to see what’s happening here

luis_bergua · May 9, 2024, 12:06pm

Hey @josiahgottfried, following up here! Would it be possible to share the repro code so we can test on our side?

luis_bergua · May 13, 2024, 2:31pm

Hi Josiah, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Topic		Replies	Views
Publishing Graphs/Visualizations W&B Help	4	641	January 31, 2022
Explain metrics displayed in WandB W&B Help dashboard , wandb	4	633	April 28, 2023
Sweep Visualisation gets cluttered when using HuggingFace to report to W&B W&B Help dashboard , sweeps	0	18	January 3, 2025
Notebook/full code for "Hyperparameter Optimization for HuggingFace" W&B Help sweeps	5	988	July 30, 2023
W&B Visualization Issue: "There's no data for the selected runs" W&B Help wandb	0	57	March 8, 2025

Visualizing PBT in dashboard

Related topics