New metrics for old epochs

evgeny-tanhilevich · August 8, 2023, 3:16pm

Hi there! I am using WandB via Torch Lightning, on an on-premise cluster. I realised that I have missed some metrics on my previous runs, that are already logged to WandB. Is it possible to get the new metrics to show in the dashboard, after I recompute them by restoring from old checkpoints? I can compute the metrics from old checkpoints no problem, getting them to show in the dashboard is an issue.

UPDATE
Here is what I am trying to achieve, in pseudocode:

import wandb
import torch 
import lightning.pytorch as pl

run_id = get_old_run_id_from_command_line()
cp_path = get_old_checkpoint_path(run_id, old_epoch)
cp = torch.load(cp_path, map_location='cpu')
step = cp['global_step']

trainer = init_trainer()
lm = init_lightning_module()
dm = init_data_module()
trainer.validate(lm, datamodule=dm, ckpt_path=cp_path)

run = wandb.init(id=run_id, resume="allow")
metrics = {'new_metric': lm.new_metric.compute().to('cpu').item()}
wandb.log(metrics, step=step, commit=True)
run.finish()

At this point I am expecting to see a new chart in my dashbord for the old run, titled new_metric with a single dot corresponding to the old_epoch. I do see an new chart, however it is empty. In the Run Summary that is printed by wandb in the end I can see that it uses the last epoch of the old run, not the step that I am passing in.

Am I missing something here?

nathank · August 10, 2023, 5:47pm

Hi @evgeny-tanhilevich, is it possible that you are attempting to log to a step < the last step logged? This currently isn’t supported but can be worked around by plotting against a different metric using the define metric workflow shown here

evgeny-tanhilevich · August 10, 2023, 8:38pm

Hi @nathank , thanks for getting back to me. Yes, you are correct, I am trying to log to a historic step (historic epoch). I have checked the documentation for new_metric, as you have suggested. I am not sure this workaround will work for me, as I cannot see how I could define the correspondence between my new metric (e.d. validation_epoch) and the old epochs that I have already logged. I guess I could try other workarounds, such as creating a new run and grouping it with the old one (explained here), but this all seems too complicated for what I need. I am ok with just logging my torch metrics to a CSV file for now.

mohammadbakir · August 11, 2023, 9:27pm

Hi @evgeny-tanhilevich please note we do have an active feature request for modifying run history at a particular step. We will provide an update once a decision has been made if the change will take place. I will mark this resolved as you mentioned you have a work around logging to a CSV. However, do write in again if you have any more questions. Cheers!

system · October 10, 2023, 9:27pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Log custom metrics for a run outside of the training loop W&B Help	10	2145	March 25, 2023
Overwrite previously logged metrics when resuming a run W&B Help wandb	6	823	March 14, 2025
A good way to run wandb.log() many times in one epoch Show the Community! wandb	0	1174	January 17, 2024
Logging values to ongoing run from a different process W&B Help	4	247	September 4, 2022
Graphs out of sync with each other W&B Help dashboard , wandb	9	979	October 15, 2022

New metrics for old epochs

Related topics