I feel divergence is better predicted by activations (or update step) than weights or gradients. How to watch that?
Hi @brando , thanks for writing in. Here’s an example report in which the activations are logged in TensorFlow.
We don’t have any examples w.r.t. gradients of activations yet, but if you’re looking for model explainability - this GradCAM report might be of interest to you. Currently, we don’t have any way of using
wandb.watch for this use-case, however, I’ll file a feature request to extend the functionality of
Also, if you could provide some additional details/context about the use case would be very helpful as well.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.