I am running four experiments from the same system (a google cloud VM) and while one is running fine: three have frozen (no progress but program still active/has not errored out). Curious if anyone knows how to fix this?
Hi @usman391 ,
Thank you for reaching out with your support request. We will be glad to look into this to determine the cause.
Can you please provide the following:
- Description of the experiment you are running (single runs from four agents, or parallel processes?)
- Sample code for you how you are initializing/executing runs
-
debug
logs for the runs this error is occurring, they live in theWANDB_DIR
which defaults to./wandb
in your project folder.
Please attach the code sample/logs here or send directly to me at mohammad.bakir@wandb.com.
Regards,
Mohammad
Thanks for the response Mohammad Bakir. The error actually went away after I rebooted the system and has not occurred again since. If it occurs again, I will let you know.
Hi @usman391 , thank you for updating us that this error has gone away. Yes please do let us know if this occurs again. I will mark this matter closed in the meantime.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.