I’m logging several metrics which, by virtue of my training data/procedure, are occasionally undefined during training and produce NaNs.
The problem is as follows. For on_step
level logging, I am able to see each step where a number was logged or a NaN was logged. However, for on_epoch
level logging, the metrics are always reported as NaNs. I think this is because the on_epoch
level logging is doing a mean of the recorded values over the epoch, and some of these are NaNs.
Is there some way to specify a nanmean
aggregation operation when computing epoch level metrics?
Thank you.