Attached is the gradient histograms of my training. It seems that the gradient for lin1.weight and line2.weight are mostly zero everywhere. Does it mean that the model doesn’t learn anything from these parameters and should I exclude them my optimizer?
Thank you very much
Hi @thongnt , it’s difficult to say why your gradients are zeroed out. Assuming it’s not an error in your code, you may be encountering a vanishing gradient which could be leading to overflow / underflow issues. Here are some debugging steps I can suggest. 1) ensure the that you’re calling
optimizer.zero_grad() before each batch 2). try normalizing the weights and inputs 3). Try implementing gradient clipping. Please let me know if any of these work for you.
Hi @thongnt , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!