Why does wandb.watch() monitor some parameters' gradients twice?

I am using wandb.watch() to plot histograms of model’s parameters and gradients. I found some parameters’s (noted as set A) gradients are plotted twice, while the parameters themselves and other parameters (noted as set B) and their gradients are plotted normally once.

For example, I train a model for 1000 steps, and set wandb.watch()'s log_freq to be 100. In this case, A’s gradients will be plotted at timesteps [0,1,2,3,4,5,…,19] (20 plots, unexpected), A’s weights, B’s weights, B’s gradients will be plotted at timesteps [0,2,4,…,18] (10 plots, expected, although they are spaced by 2, which I guess is caused by A’s gradients).

I did some online search, a similar issue was observed here. But I didn’t use checkpointing in my model.

Any hints on the issue above? Thanks!

Hello @peter2578 !

Would you be able to share the following:

  • A link to your workspace? If the project is private, only wandb employees will be able to review it.
  • A code snippet of your experiement

What may be happening is that model A may be logging more often purely based on the way the model is logging. This may be a Pytorch behavior, but I can certainly check.

Hi Peter, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.