Hi Leslie,
I apologize for not responding earlier. I assumed to be notified by e-mail when this thread is updated. Probably I need to check my settings, or “watch” this thread.
I log via Pytorch Lightning:
wandb_logger = WandbLogger(project=settings.project_name, log_model=True)
wandb_logger.watch(model, log='gradients', log_freq=50, log_graph=True)
The actual code for the logging is this:
temp_accs_top_k = {f'{k:->4d}': v for k, v in zip(settings.ks, temp_accs)}
lightning_module.log(f'{split}/temp_top-k', temp_accs_top_k, batch_size=lightning_module.batch_size)
That looks a bit odd I suppose. The code is in a function that I call from several different pl.LightningModule
s. The variable lightning_module
refers to that module. The parameter temp_accs_top_k
evaluates to (straight from the debugger):
{'---1': 0.00019996000628452748, '---2': 0.00019996000628452748, '---3': 0.00039992001256905496, '---5': 0.0005998800043016672, '--10': 0.0005998800043016672, '--20': 0.001399720087647438, '--50': 0.004199160262942314, '-100': 0.007598480209708214, '1000': 0.08318336308002472}
Which is wrong. But I am seeing the values in the graph panels (see attached screenshot).
I changed the code so that temp_accs_top_k
now contains {'test/temp_top-k.---1': 0.2963850498199463, 'test/temp_top-k.---2': 0.3962452709674835, 'test/temp_top-k.---3': 0.44557619094848633, 'test/temp_top-k.---5': 0.5052925944328308, 'test/temp_top-k.--10': 0.5733972191810608, 'test/temp_top-k.--20': 0.6277211904525757, 'test/temp_top-k.--50': 0.6810465455055237, 'test/temp_top-k.-100': 0.716596782207489, 'test/temp_top-k.1000': 0.802676260471344}
.
I log in a loop since Pytorch Lightning can’t log a dict (I believe). I know that wandb does it, but I need the batch_size parameter (I have two dataloaders with different sizes/lenghts and need to make sure that Pytorch Lightning does not get confused with steps/epochs).
for k, v in temp_accs_top_k.items():
lightning_module.log(k, v, batch_size=lightning_module.batch_size)
Update: just realized that Pytorch Lightning has a log_dict
function which lets me get rid of the awkward for loop.
So the “bug” is more like “why did it work in the first place (in the graph panels)?”
Hope that’s not too much to digest and it is traceable.
Best,
Stephan