New metrics for old epochs

Hi there! I am using WandB via Torch Lightning, on an on-premise cluster. I realised that I have missed some metrics on my previous runs, that are already logged to WandB. Is it possible to get the new metrics to show in the dashboard, after I recompute them by restoring from old checkpoints? I can compute the metrics from old checkpoints no problem, getting them to show in the dashboard is an issue.

Here is what I am trying to achieve, in pseudocode:

import wandb
import torch 
import lightning.pytorch as pl

run_id = get_old_run_id_from_command_line()
cp_path = get_old_checkpoint_path(run_id, old_epoch)
cp = torch.load(cp_path, map_location='cpu')
step = cp['global_step']

trainer = init_trainer()
lm = init_lightning_module()
dm = init_data_module()
trainer.validate(lm, datamodule=dm, ckpt_path=cp_path)

run = wandb.init(id=run_id, resume="allow")
metrics = {'new_metric': lm.new_metric.compute().to('cpu').item()}
wandb.log(metrics, step=step, commit=True)

At this point I am expecting to see a new chart in my dashbord for the old run, titled new_metric with a single dot corresponding to the old_epoch. I do see an new chart, however it is empty. In the Run Summary that is printed by wandb in the end I can see that it uses the last epoch of the old run, not the step that I am passing in.

Am I missing something here?

Hi @evgeny-tanhilevich, is it possible that you are attempting to log to a step < the last step logged? This currently isn’t supported but can be worked around by plotting against a different metric using the define metric workflow shown here

Hi @nathank , thanks for getting back to me. Yes, you are correct, I am trying to log to a historic step (historic epoch). I have checked the documentation for new_metric, as you have suggested. I am not sure this workaround will work for me, as I cannot see how I could define the correspondence between my new metric (e.d. validation_epoch) and the old epochs that I have already logged. I guess I could try other workarounds, such as creating a new run and grouping it with the old one (explained here), but this all seems too complicated for what I need. I am ok with just logging my torch metrics to a CSV file for now.

Hi @evgeny-tanhilevich please note we do have an active feature request for modifying run history at a particular step. We will provide an update once a decision has been made if the change will take place. I will mark this resolved as you mentioned you have a work around logging to a CSV. However, do write in again if you have any more questions. Cheers!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.