Strange global_step restarts affecting learning rates and performance?

versae · November 6, 2023, 4:57pm

I’m seeing the weirdest global_step restarts during my training runs. And it does affect the optimizer as it restarts the warmup every time global_step goes to zero.

This is in Jax using a joint scheduler from Optax to create the learning rate function. I log separately the step number when doing evals and it the counter seems fine, but for some reason the logged values in wandb are pretty strange.

carlo-catimbang · November 8, 2023, 10:23am

hi @versae

Thank you for reaching out for support. I’ll check this on our end and we’ll get back to you for updates.

carlo-catimbang · November 13, 2023, 10:01am

Hi @versae,

The issue you’re experiencing might be due to multiple calls to wandb.log for the same training step. The wandb SDK has its own internal step counter that is incremented every time a wandb.log call is made. This means that there is a possibility that the wandb log counter is not aligned with the training step in your training loop.

To avoid this, you can specifically define your x-axis step using wandb.define_metric. You only need to do this once, after wandb.init is called. Here is an example:

wandb.init(...)
wandb.define_metric("*", step_metric="global_step")

The glob pattern, “*”, means that every metric will use “global_step” as the x-axis in your charts. If you only want certain metrics to be logged against “global_step”, you can specify them instead:

wandb.define_metric("train/loss", step_metric="global_step")

This should help align your global_step with the internal wandb step counter and prevent it from restarting to zero.

carlo-catimbang · November 16, 2023, 4:09am

Hi @versae ,

I just want to follow up if this helps and you still need assistance.

Regards,
Carlo Argel

versae · November 17, 2023, 3:22pm

Hi!

We’re still investigating. Thanks for your help!

Cheers.

carlo-catimbang · November 21, 2023, 12:34am

Hi @versae ,

Thank you for informing us. We are closing this out due to internal tracking purposes. You can write back anytime you are ready to proceed with this.

Regards,
Carlo Argel

system · January 20, 2024, 12:35am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Changing global_step manually W&B Help wandb	12	912	February 6, 2024
Step vs Global step in wandb watch Show the Community! wandb	2	2992	August 29, 2023
Resume logging and deleting specific steps W&B Help dashboard , wandb	4	874	January 8, 2024
Stable Baslines3: step vs global_step vs tensorboard step W&B Help wandb , beginner-friendly	3	2432	June 10, 2023
What is the W&B sdk "Step" counter logging? W&B Help	3	2063	November 5, 2023

Strange global_step restarts affecting learning rates and performance?

Related topics