I’m logging several metrics which, by virtue of my training data/procedure, are occasionally undefined during training and produce NaNs.
The problem is as follows. For
on_step level logging, I am able to see each step where a number was logged or a NaN was logged. However, for
on_epoch level logging, the metrics are always reported as NaNs. I think this is because the
on_epoch level logging is doing a mean of the recorded values over the epoch, and some of these are NaNs.
Is there some way to specify a
nanmean aggregation operation when computing epoch level metrics?