Tensorboard sync shows incorrect number of steps


I have observed a strange behavior when synchronizing tensorboard runs. Two runs have different lengths in steps when uploaded on wandb. And both are wrong. They are probably different due to multiprocessing. Although, if I open the tensorboard tab in the wandb interface it shows both results correctly.

I can provide the files if I figure out how to attach them here. Or should I upload it somewhere else?

Since I haven’t figured out how to paste log files here, I uploaded them to the third-party website.

Hey @martslaaf , apologies for the delay here. I couldn’t find the files you attached above.

Are you using sync_tensorboard and making calls to wandb.log as well in your script? If yes, this makes the default step to be incorrect, but the global_step x-axis trick should work for the tensorboard metrics. Also, you might want to add global_step to the wandb.log calls you make if you want to line them up with the tensorboard metrics.

However, if this doesn’t fix your issue, could you please share:

  1. your debug bundle log (debug.log and debug-internal.log) for the runs having different steps?
  2. a minimal script as in how you’re logging, this could help us in reproducing the issue on our side.

Hi @martslaaf , we wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.