I’ve been using wandb sweeps and I found that after each run is finished, the following message shows up
wandb: Waiting for W&B process to finish… (success)
but then 10 minutes pass with nothing happening.
Only after this long time wandb shows the run history and summary and starts a new run.
I’m using wandb in a Gradient Paperspace notebook, running it from the terminal.
I’ve found anyone else with this issue, so it may be something wrong at my side.
Do you have any idea of what the problem might be?
Hi @ogait , in your project workspace, under the overview page for the sweep/runs associated with the sweep, what is the status of those sweeps/runs? Additionally, we can take a look at your debug bundles to verify if there is anything that is causing issues. They are the debug.log and debug-internal.log files located in the working directory of the project inside the wandb folder of the runs. Please provide logs for the runs where you are seeing issues.
Hello Mohammad thanks for the response.
When this happens, both the sweep and the run status is “running”.
Here is an example of debug.log and debug-internal.log (unfortunately the files are too big to be included in this message): Easyupload.io - Upload files for free and transfer big files easily.
Hi @ogait , thank-you for providing the files. After review it appears this is due to a performance related bug on our end where the run exit response hangs until all the run data syncs, example. [wandb_run.py:_on_finish():2221] got exit ret: None. This bugs shows up in runs with a lot of .log calls. The bug is currently in Selected For Development. I will update you here when there has been movement.
Thanks for the update Mohammad!
Can I do something to reduce the number of log calls?
I am using fastai with the following callback WandbCallback(log_model=False, log_preds=False) and at the end of training I use wandb.summary to save six simple variables.
A few years ago, you reached out to us regarding wandb syncing data at a very slow rate. I am pleased to announce that our engineering team has worked hard to rebuild our SDK from the ground up with a focus on significant performance improvements with up to 88% gain when logging through multiple processes. To try out our new SDK, upgrade to wandb ≥ v0.17.3 and add wandb.require("core") to your scripts for improved logging performance. We would love to have you try it out and give us your thoughts on the impact it has had on your experiment runs. If you have any questions please let us know.