HP sweep - correct way to stop a specific agent (and not the entire sweep)


I am conducting a parameter sweep, and I use my dev machine during the night for extra compute. My problem is that I don’t know how to correctly stop the local runs. Any suggestions or best practices would be appreciated.

A related question - if I stop an agent run forcefully (for instance, close the process running the agent) how would the sweep controller handle the run data? would it remove it from the dashboard? would it be indicated in any way?


Hi Tom,

You can use the W&B Dashboard or end the process in order to kill a sweep. You can use the “sweep controls” option in the sweeps menu to control your sweep. I have attached an image of the sweep dashboard for context:

The pause option lets you pause the sweep, which means that the agent will finish the current run and wait till you unpause the run again. You can also stop the run, which will wait for the current run to end, after which it will end the process.

In case you choose to Cancel the run, any run taking place at the time will stop without waiting for the training cycle to end. Any metrics that have already been logged to W&B will still be visible in your dashboard.

Similarly, if you were to forcefully stop a process, the agent will stop running and any metrics already logged to the W&B Dashboard will still be present.

All the best,

Thank you for your response!

In that case, if I may suggest a feature request: since the sweep is aware of the agents currently running, it would be great to kill or pause\resume a specific agent (instead of all the agents).


You are welcome! Thank you for your feature request, I will pass this information to the engineering team.

Is there anything else I can help you with?

Thank you for your help :slight_smile: