How to debug when wandb says run has crashed but SLURM continues to run?

Hello,

I was wondering if you had any tips on how to debug a situation where wandb says that a run has crashed but the SLURM job continues to run.

Thanks,
Nicole

Hi @wongnicolehy,

Thank you for writing in. Could you talk a bit about your setup, as well as your workflow?

Hi there Niclole, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi Nicole, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

I’d like to re-open the conversation. What specifically do you want to know about setup and workflow?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.