Wandb.finish() takes too long to finish

I was working with wandb to track my experiment.
The experiment run on the GKE cluster by using MLFlow Projects.
But, from several weeks ago (I suggest it’s around the release of wandb 0.15.0), I found that my training job doesn’t exit just after it finished the traininig job. It finished almost after 24 hours.
I didn’t mkae breaking change in my code. So I’m suspecting whether there is discrepency on this situation.
Because of that reason, I started to fix version of wandb to be 0.14.2 rather than 0.15.0.
Can I get the help?
Here is the last log from the running process.


wandb: Waiting for W&B process to finish… (success).
wandb: Network error (ReadTimeout), entering retry loop.

Hello @axb-data !

If there has not been any changes and you are now just experiencing this error, could you send me the debug logs for this run?

They should be located in the wandb folder in the same directory as where the script was run. The wandb folder has folders formatted as run-DATETIME-ID associated with a single run. Could you retrieve the debug.log and debug-internal.log files from one of these folders specifically from the run that is having issues?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.