When I try running parallelized sweep runs, the wandb keeps waiting and doesn’t terminate even when all the runs show status finished and everything has been logged. What could be the reason for this and how can I resolve this?
Hello Banooqa!
wandb
asynchronously uploads your sweeps/runs to the wandb
server which means larger sweeps/runs will still be uploading the runs to wandb
despite your training being finished. Depending on how large the run is, you may also be hitting our rate limits which restrict the amount of API calls and parallel agents each IP and API key has. In terms of preventing this, reducing the size of your data or decreasing the amount of wandb.log
calls you have in your script per second can also increase performance as outlined here.
Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.