Wandb for Huggingface Trainer saves only first model

kgarg8 · April 20, 2022, 7:18am

I am finetuning multiple models using for loop as follows.

for file in os.listdir(args.data_dir):
    finetune(args, file)

BUT wandb shows logs only for the first file in data_dir although it is training and saving models for other files. It feels very strange behavior.

wandb: Synced bertweet-base-finetuned-file1: https://wandb.ai/***/huggingface/runs/***

This is a small snippet of finetuning code with Huggingface:

def finetune(args, file):
    training_args = TrainingArguments(
        output_dir=f'{model_name}-finetuned-{file}',
        overwrite_output_dir=True,
        evaluation_strategy='no',
        num_train_epochs=args.epochs,
        learning_rate=args.lr,
        weight_decay=args.decay,
        per_device_train_batch_size=args.batch_size,
        per_device_eval_batch_size=args.batch_size,
        fp16=True, # mixed-precision training to boost speed
        save_strategy='no',
        seed=args.seed,
        dataloader_num_workers=4,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset['train'],
        eval_dataset=None,
        data_collator=data_collator,
    )
    trainer.train()
    trainer.save_model()

anmolmann · April 20, 2022, 7:56pm

@kgarg8 , you’ve set save_strategy to NO in your code to avoid saving anything. This would only save the final model once training is done with trainer.save_model() . You can update it to save_strategy="epoch" and it will save the model with every epoch.

Or, in order to log models, you could also set the env var WANDB_LOG_MODEL as specified in our docs here. Once you set this env var, any Trainer you initialize from now on will upload models to your W&B project. Note that your model will be saved to W&B Artifacts as run-{run_name} .

kgarg8 · April 21, 2022, 2:50pm

wandb.init(reinit=True) and run.finish() helped me to log the models separately on wandb website.

The working code looks like below:


for file in os.listdir(args.data_dir):
    finetune(args, file)

import wandb
def finetune(args, file):
    run = wandb.init(reinit=True)
    ...
    run.finish()

Reference: Launch Experiments with wandb.init - Documentation

system · June 20, 2022, 2:51pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HuggingFace Trainer Doesn't Log Validation Loss to WandB W&B Help wandb	4	3621	October 5, 2021
Saving model's weights W&B Help	1	332	April 20, 2022
How to show "f1_macro" when using hugging face transformer? W&B Help	2	539	April 20, 2022
Wandb sweeep training error W&B Help sweeps , wandb , beginner-friendly	3	485	March 7, 2023
Track train script version along with hyperaparams (ideally automated) W&B Help	3	448	April 20, 2022

Wandb for Huggingface Trainer saves only first model

Related topics