Ignore NaNs when computing per-epoch statistics

jonathanking · August 31, 2023, 4:40pm

I’m logging several metrics which, by virtue of my training data/procedure, are occasionally undefined during training and produce NaNs.

The problem is as follows. For on_step level logging, I am able to see each step where a number was logged or a NaN was logged. However, for on_epoch level logging, the metrics are always reported as NaNs. I think this is because the on_epoch level logging is doing a mean of the recorded values over the epoch, and some of these are NaNs.

Is there some way to specify a nanmean aggregation operation when computing epoch level metrics?

Thank you.

raphael-sanandres · September 5, 2023, 5:43pm

Hello @jonathanking !

Are you using the W&B Pytorch Lightning Integration? If this is the case, then wandb will simply take the values that PTL provides it. You are correct in that PTL does log an aggregation when you enable on_epoch logging and thus logging NaNs everytime you log an epoch. You will have to define your own custom reduction as explained in PTL’s documentation in order to log a different aggregation.

raphael-sanandres · September 7, 2023, 7:06pm

Hi Jonathan, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · November 4, 2023, 5:43pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logging Validation Loss and Test Loss during Epoch W&B Help	0	249	November 6, 2024
How can I log best values of a metric/loss in wandb summary using Pytorch-Lightning? W&B Help	5	1646	April 20, 2022
Clarification on Early Termination (Hyperband) W&B Help	3	655	April 20, 2022
Logging Metrics for each sample per epoch W&B Help	4	1602	September 13, 2022
Run summary not displaying logged values with different step sizes W&B Help	3	1278	October 10, 2023

Ignore NaNs when computing per-epoch statistics

Related topics