I feel divergence is better predicted by activations (or update step) than weights or gradients. How to watch that?
Hi @brando , thanks for writing in. Here’s an example report in which the activations are logged in TensorFlow.
We don’t have any examples w.r.t. gradients of activations yet, but if you’re looking for model explainability - this GradCAM report might be of interest to you. Currently, we don’t have any way of using wandb.watch
for this use-case, however, I’ll file a feature request to extend the functionality of wandb.watch
.
Also, if you could provide some additional details/context about the use case would be very helpful as well.