I have a very basic question regarding the stable baseline3 integration.
I want to plot basic stuff like the average episode reward. However, I am confused by the terms step and global_step. What is the difference between them?
When plotting global_step over step I was expecting a straight line, but it turns out there is no linear relationship between those two values.
Could someone explain to me the increment-rules of step and global_step?
Also, when looking at the tensor board plots from within the WandB dashboard, I can see that the number of steps tensor board uses as x-axis differs to both global_step and step used in the wandb plots. Something is very weird.
Hi @erikk , happy to help. At a high level the RL trainers maintain track of how many steps have been taken during the training when batches are processed during training. During training, the global_step is updated every time a batch is processed. When logging training metrics to wandb, the global_step is used as the x-axis to indicate this. The wandb sdk also has an internal step counter which follows a different rule for increments. Hence, both the trainer global step and wandb step variables are updated at different times due to the update conditions being different.
In regards to the Tensor Board behavior you are seeing, could you provide me a link to your workspace for review, or screenshots of what you are seeing. This will help me better understand what you are seeing.