Hi there! I am using WandB via Torch Lightning, on an on-premise cluster. I realised that I have missed some metrics on my previous runs, that are already logged to WandB. Is it possible to get the new metrics to show in the dashboard, after I recompute them by restoring from old checkpoints? I can compute the metrics from old checkpoints no problem, getting them to show in the dashboard is an issue.
UPDATE
Here is what I am trying to achieve, in pseudocode:
import wandb
import torch
import lightning.pytorch as pl
run_id = get_old_run_id_from_command_line()
cp_path = get_old_checkpoint_path(run_id, old_epoch)
cp = torch.load(cp_path, map_location='cpu')
step = cp['global_step']
trainer = init_trainer()
lm = init_lightning_module()
dm = init_data_module()
trainer.validate(lm, datamodule=dm, ckpt_path=cp_path)
run = wandb.init(id=run_id, resume="allow")
metrics = {'new_metric': lm.new_metric.compute().to('cpu').item()}
wandb.log(metrics, step=step, commit=True)
run.finish()
At this point I am expecting to see a new chart in my dashbord for the old run, titled new_metric
with a single dot corresponding to the old_epoch
. I do see an new chart, however it is empty. In the Run Summary that is printed by wandb
in the end I can see that it uses the last epoch of the old run, not the step that I am passing in.
Am I missing something here?