When is one supposed to run wandb.watch so that weights and biases tracks params and gradients?

There are two things you might be running into here – can’t confirm because your code relies on the ultimate-utils package.

  1. wandb.watch will only start working once you call wandb.log after a backwards pass that touches the watched Module (docs).
  2. The frequency with which gradients/params are logged is controlled by the log_freq argument. If the number of logging calls is less than the value of log_freq, then no information will be logged. Here’s a short colab reproducing this behavior.

Also, if you want params and gradients, you need to set the log kwarg to "all". By default, we log only gradients.

1 Like