Does W&B gradient logger work properly with gradient scaler?

ifedorov · July 11, 2024, 5:26am

Hi, I’m using huggingface trainer framework to train my model and logging everything to W&B. The W&B histogram logger for gradients shows me the following pictures:

The out_proj layer is the last one in my architecture, so it was very unusual to see so high gradients magnitude. I took a look inside and found that the gradients are actually small, but they do get high within intermediate step of AMP gradient scaling and unscaling. So I guess that W&B just doesn’t work well with grad scaler.

Before unscaling:

model_ref.classifier.out_proj.weight.grad
tensor([[  354.6875, -1280.5000,   538.1250,  ...,  1188.5000,  -150.5625,
          1870.5000],
        [   93.0205, -1208.0000,   390.3750,  ...,   534.8750,   -38.1250,
           738.2500],
        [   -5.9375,  2244.5000, -1384.5000,  ...,  -503.1875,   -11.0625,
         -1256.3125],
        [  211.8281,   488.5000,    37.3750,  ..., -1205.5000,   494.1250,
         -1308.5000],
        [ -664.7500,   -67.0000,  -156.6250,  ...,   185.0000,   -59.9531,
           923.7500],
        [   11.0000,  -177.6250,   574.6250,  ...,  -199.1250,  -234.6953,
          -966.3750]], device='cuda:0')

After unscaling:

model_ref.classifier.out_proj.weight.grad
tensor([[ 5.4121e-03, -1.9539e-02,  8.2111e-03,  ...,  1.8135e-02,
         -2.2974e-03,  2.8542e-02],
        [ 1.4194e-03, -1.8433e-02,  5.9566e-03,  ...,  8.1615e-03,
         -5.8174e-04,  1.1265e-02],
        [-9.0599e-05,  3.4248e-02, -2.1126e-02,  ..., -7.6780e-03,
         -1.6880e-04, -1.9170e-02],
        [ 3.2322e-03,  7.4539e-03,  5.7030e-04,  ..., -1.8394e-02,
          7.5397e-03, -1.9966e-02],
        [-1.0143e-02, -1.0223e-03, -2.3899e-03,  ...,  2.8229e-03,
         -9.1481e-04,  1.4095e-02],
        [ 1.6785e-04, -2.7103e-03,  8.7681e-03,  ..., -3.0384e-03,
         -3.5812e-03, -1.4746e-02]], device='cuda:0')

If that is correct, I think this should be fixed to avoid misunderstanding (e.g. I though something is wrong with my model).

jason-arkens17 · July 11, 2024, 6:00pm

Hi there! Thanks so much for writing in - let me dig into this a bit and see what might be going on. I’ll follow up here as soon as I have something. In the meantime don’t hesitate to reach out with anything else you might need

jason-arkens17 · July 19, 2024, 9:46pm

Hi there!

I wanted to follow up and let you know that I’ve reported this issue to our product team. We’re actively looking into adding it to our roadmap for improvement. I really appreciate your patience as we work on enhancing this aspect of our platform.

Please don’t hesitate to reach out if you have any other questions or concerns in the meantime. Thanks again for bringing this to our attention - your feedback is invaluable in helping us make W&B better for all our users.

Topic		Replies	Views
Wanda.watch() not logging gradients in the base case W&B Help	5	598	June 26, 2024
Wandb.watch not logging parameters W&B Help	19	1999	February 5, 2022
Wandb.watch() when using mixed precision and torch.cuda.amp.GradScaler() W&B Help	4	465	April 9, 2023
Nested Log structure which is visible in the UI W&B Help wandb	4	1536	May 2, 2023
Problems logging Gradients with WandB and Pytorch Lightning W&B Help dashboard	0	31	November 6, 2024

Does W&B gradient logger work properly with gradient scaler?

Related topics