I used wandb to record the loss when training a model in PyTorch. The model was trained by 9300 steps, 100 epoches and the loss for every step was recorded. Now I want to calculate the average loss for every epoch, that is, the average loss for every 93 non-overlapping steps in the training. It will group the current losses and return 100 values, which is the average losses for 100 epoches. However, in the current wandb, I can’t find a way to achieve this. I can only find the “running average” button in “Smoothing”, but it is not the same as my need since it added the nearest 93 losses for every steps and calculate it for every step. How can I calculate the loss for every step using wandb?
Hi @leyangjin , happy to help. To accomplish this, you will have to manually calculate those desired values on your own and log them in wandb. You can do this:
- During your experiment execution where you track loss metrics locally and after calculating the desired values log them during the run
- Post run by using the API to pull your run history
api = wandb.Api()
run = api.run("<entity>/<project>/<run_id>")
history = run.scan_history()
losses = [row["train_loss"] for row in history]
#Business logic to calculate new average loss and log data to wandb
Hi @leyangjin , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!