I’ve been fine-tuning a Hugging Face model with Ray Tune population based training, and I’d like to visualize my results in W&B. However, it seems that W&B doesn’t show every version of each model. I’d like to track how the models develop over time. Is there a way to do this?
Here is a screenshot of what my dashboard looks like after runs are complete (2-5 epochs, depending on the model).
The seventh model has data for training steps 0 and 1, but there should be a lot more training steps. All the other training models have data for only step 0. The configuration variables should also vary across steps, but they don’t.
Here are the options I used for the hyperparameter search. I don’t have a repro yet, but I can work on creating one.
training_args = Seq2SeqTrainingArguments(
output_dir=".",
report_to="wandb",
learning_rate=1e-3, # config
warmup_steps=0, # config
weight_decay=0.1, # config
num_train_epochs=2, # config
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
save_total_limit=5,
max_steps=-1,
per_device_train_batch_size=4,
gradient_accumulation_steps=8,
per_device_eval_batch_size=4,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant":False}, # try
predict_with_generate=True,
generation_max_length=45,
generation_config=GenerationConfig(min_length=16, max_length=45),
use_cpu=(gpus_per_trial <= 0),
fp16=(True if gpus_per_trial>=1 else False),
logging_strategy="epoch",
logging_dir="./logs",
)
trainer = Seq2SeqTrainer(
model_init=get_model,
args=training_args,
train_dataset=tokenized_tweets["train"],
eval_dataset=tokenized_tweets["test"],
data_collator=data_collator,
compute_metrics=compute_metrics,
)
tune_config = {
"num_train_epochs": tune.choice([2, 3, 4, 5]),
"max_steps": 1 if smoke_test else -1, # Used for smoke test.
}
scheduler = PopulationBasedTraining(
time_attr="training_iteration",
metric="eval_rouge1",
mode="max",
perturbation_interval=1,
hyperparam_mutations={
"weight_decay": tune.uniform(0.0, 0.1),
"learning_rate": tune.loguniform(1e-4, 1e-1),
"warmup_steps": tune.randint(0, 4),
},
)
reporter = CLIReporter(
parameter_columns={
"weight_decay": "w_decay",
"learning_rate": "lr",
"warmup_steps": "warmup",
"num_train_epochs": "num_epochs",
},
metric_columns=["eval_loss", "eval_rouge1", "eval_rouge2", "eval_rougeL",
"eval_gen_len", "eval_runtime", "epoch", "training_iteration"],
)
best_result = trainer.hyperparameter_search(
hp_space=lambda _: tune_config,
backend="ray",
n_trials=num_samples,
resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
scheduler=scheduler,
checkpoint_score_attr="training_iteration",
stop={"training_iteration": 1} if smoke_test else None,
progress_reporter=reporter,
storage_path="/kaggle/working/ray_results/",
name="tune_transformer_pbt",
log_to_file=True,
callbacks=[WandbLoggerCallback(
project="t5_pbt_0",
group="t5_pbt_0",
log_config=True,
upload_checkpoints=True,
)],
checkpoint_config=train.CheckpointConfig(num_to_keep=4),
)