Records partly lost in training runs

Identical problem to All records are lost in a project without any action

Except that problem was solved. I’m seeing some of my metrics completely deleted except for a single point.
I’m a new user so I can’t post images, but my W&B boards look almost exactly the same as the ones in the link I provided.

I’ve noticed an interesting pattern: the first model trained on any given device (i.e. my personal laptop or VM) retains the entire train loss curve. However, any subsequent models trained throw out all data. Moreover, the train data is logged during the training run perfectly fine, it’s after the run is finished that the data is deleted.

Hope someone can help out, thanks!

Hi @samitizerxu, thank you for reaching out to W&B and I am sorry to hear you are experiencing this.

To help us troubleshoot this, would you mind sharing the following information:

  • The name of the project you are seeing this issue
  • The name of a panel where you are seeing missing data
  • The debug.log and debug-internal.log from the ./wandb/run-date_time-runid/logs folder
  • Are you currently using the resume Run feature when running from the same device?

Thanks!
Francesco

Hi @samitizerxu, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi @samitizerxu , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!