I am using wandb.watch() to plot histograms of model’s parameters and gradients. I found some parameters’s (noted as set A) gradients are plotted twice, while the parameters themselves and other parameters (noted as set B) and their gradients are plotted normally once.
For example, I train a model for 1000 steps, and set wandb.watch()'s log_freq to be 100. In this case, A’s gradients will be plotted at timesteps [0,1,2,3,4,5,…,19] (20 plots, unexpected), A’s weights, B’s weights, B’s gradients will be plotted at timesteps [0,2,4,…,18] (10 plots, expected, although they are spaced by 2, which I guess is caused by A’s gradients).
I did some online search, a similar issue was observed here. But I didn’t use checkpointing in my model.
Any hints on the issue above? Thanks!