Hi @erikk , happy to help. At a high level the RL trainers maintain track of how many steps have been taken during the training when batches are processed during training. During training, the global_step
is updated every time a batch is processed. When logging training metrics to wandb, the global_step
is used as the x-axis to indicate this. The wandb sdk also has an internal step counter which follows a different rule for increments. Hence, both the trainer global step and wandb step variables are updated at different times due to the update conditions being different.
In regards to the Tensor Board behavior you are seeing, could you provide me a link to your workspace for review, or screenshots of what you are seeing. This will help me better understand what you are seeing.