Feature Request: Sweep agents generate new runs before consuming runs from preemption queue

Hi, I wish to have the ability for all the runs of my sweep to make simultaneous progress in a round-robin fashion with preemption enabled.

I am currently running sweeps on a SLURM enabled cluster with pre-emption enabled. My runs save and restore their state and as per the W&B documentation: “Sweep agents will consume runs off the run queue until the queue is exhausted, at which point they will resume generating new runs based on the standard sweep search algorithm.”

This means that despite frequent preemptions on SLURM, the sweep controller will make sure to resume existing runs until their completion, before launching new runs. However, I would like to instead have an option where the sweep controller prioritizes newer runs before it circles back to resuming pre-empted runs i.e. allow runs to make progress in a round-robin fashion.

Taking this one step further, W&B should also have a feature for assigning a numerical priority to each pre-empted runs as well as un-launched runs to provide fine-grained control of which job to prioritize. An example of a practical benefit of this feature: If certain runs are obtaining a better validation loss than others, it would be beneficial for them to make progress faster than runs that have a worse validation loss.

Thanks for the suggestion!
I’ve forwarded it to the relevant team. :slight_smile:

Hey @nmodhe - happy to make this feature request for you, and I’ll write back in this thread with any progress that arises

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.