How to log train samples per second in huggingface training?

Hi,

I’m using huggingface Seq2Seq trainer in a setup similar to this script: qlora/qlora.py at main · artidoro/qlora · GitHub
It’s logging to wandb using trainer’s argument report_to=wandb.

I’d like to log the time taken to train on a single sample in the dataset. How can I do that?

Progress so far

  1. Configure transformers.Seq2SeqTrainer to log it directly. trainer.evaluate automatically logs eval/samples_per_second. But I couldn’t find a way to config trainer to do so for training. Maybe someone knows how to do that?

  2. Derive metrics in WANDB. I can plot global_step vs time in wandb dashboard. This could be used to calculate time per step and I could even divide it by batch size to compute time per sample. However I’ve been unable to find a way to derive metrics in WANDB. Does anyone know what’s the best way to do that?

Hey cerebral,

You can get this by manually calculating the time taken to prepare on a single sample and afterward logging it to Weights and Biases (wandb) during preparing. Here is a brief outline of how you can make it happen:

  • “Use a timer to measure the time taken for each iteration inside your training loop.”

  • Divide the total time by the number of samples processed in that iteration to determine the time spent on each sample.

  • “Log this calculated metric to wandb with the appropriate step number using wandb.log().”

  • “For each iteration during training, repeat these steps.”

To give you an idea, here is an example of a code snippet:

import time
import wandb

# Initialize wandb
wandb.init(project="your-project-name")

# Your training loop
for batch in training_dataloader:
    start_time = time.time()
    
    # Training logic here
    
    end_time = time.time()
    time_per_sample = (end_time - start_time) / len(batch)
    
    # Log time per sample to wandb
    wandb.log({"time_per_sample": time_per_sample}, step=global_step)

Replace “your-project-name” in wandb with the name of your actual project and modify the code according to your specific training setup. You should be able to record the amount of time spent training a single sample from your dataset using this method. If you need more assistant then reply to this because currently I am looking for ruby on rails training.

Thanks
(Marcos)

1 Like

Thanks @marcosandrew for following up with @cerebral above, really appreciate your response here in the community form.

Hi @cerebral , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi, since we have not heard back from you, we are going to close this request. If you would like to reopen the conversation, please let us know! Unfortunately, at the moment, we do not receive notifications if a thread reopens on Discourse. So, please feel free to create a new ticket regarding your concern if you’d like to continue the conversation.