Visualizing PBT in dashboard

I’ve been fine-tuning a Hugging Face model with Ray Tune population based training, and I’d like to visualize my results in W&B. However, it seems that W&B doesn’t show every version of each model. I’d like to track how the models develop over time. Is there a way to do this?
Here is a screenshot of what my dashboard looks like after runs are complete (2-5 epochs, depending on the model).


The seventh model has data for training steps 0 and 1, but there should be a lot more training steps. All the other training models have data for only step 0. The configuration variables should also vary across steps, but they don’t.
Here are the options I used for the hyperparameter search. I don’t have a repro yet, but I can work on creating one.

training_args = Seq2SeqTrainingArguments(
    output_dir=".",
    report_to="wandb",
    learning_rate=1e-3,  # config
    warmup_steps=0, # config
    weight_decay=0.1,  # config
    num_train_epochs=2,  # config
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    save_total_limit=5,
    max_steps=-1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=4,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False}, # try
    predict_with_generate=True,
    generation_max_length=45,
    generation_config=GenerationConfig(min_length=16, max_length=45),
    use_cpu=(gpus_per_trial <= 0),
    fp16=(True if gpus_per_trial>=1 else False),
    logging_strategy="epoch",
    logging_dir="./logs",
)
trainer = Seq2SeqTrainer(
    model_init=get_model,
    args=training_args,
    train_dataset=tokenized_tweets["train"],
    eval_dataset=tokenized_tweets["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
tune_config = {
    "num_train_epochs": tune.choice([2, 3, 4, 5]),
    "max_steps": 1 if smoke_test else -1,  # Used for smoke test.
}

scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="eval_rouge1",
    mode="max",
    perturbation_interval=1,
    hyperparam_mutations={
        "weight_decay": tune.uniform(0.0, 0.1),
        "learning_rate": tune.loguniform(1e-4, 1e-1),
        "warmup_steps": tune.randint(0, 4),
    },
)

reporter = CLIReporter(
    parameter_columns={
        "weight_decay": "w_decay",
        "learning_rate": "lr",
        "warmup_steps": "warmup",
        "num_train_epochs": "num_epochs",
    },
    metric_columns=["eval_loss", "eval_rouge1", "eval_rouge2", "eval_rougeL", 
                    "eval_gen_len", "eval_runtime", "epoch", "training_iteration"],
)
best_result = trainer.hyperparameter_search(
    hp_space=lambda _: tune_config,
    backend="ray",
    n_trials=num_samples,
    resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
    scheduler=scheduler,
    checkpoint_score_attr="training_iteration",
    stop={"training_iteration": 1} if smoke_test else None,
    progress_reporter=reporter,
    storage_path="/kaggle/working/ray_results/",
    name="tune_transformer_pbt",
    log_to_file=True,
    callbacks=[WandbLoggerCallback(
        project="t5_pbt_0", 
        group="t5_pbt_0",
        log_config=True, 
        upload_checkpoints=True,
    )],
    checkpoint_config=train.CheckpointConfig(num_to_keep=4),
)

Hey @josiahgottfried, thanks for flagging this! It would be really helpful if you could provide the repro code when you have it ready so I can test on my end to see what’s happening here

Hey @josiahgottfried, following up here! Would it be possible to share the repro code so we can test on our side?

Hi Josiah, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!