Sweep: force agents to run through the same sequence of hyperparameters on different machines

Hello,

i am using multiple machines to parallelize a crossvalidation over the folds. now, inside my fold training function, i have implemented a wandb sweep. what i would like to do is to start an agent on each machine. each machine will have a different fold of training / valid / test data, but i want them to sweep through the exact same sequences of hyperparameters and write their outputs to the same sweep_id which i impose from outside. in this way, i can later on group the different cv runs together for the same hyperparameter sets. the issue is that i havent really found a good way to force the agents to run through the same sequences of hyperparameters. i guess, by definition, the whole reason why agents are orchestrated by the same sweep controller is that they should distribute the work of testing different hyperparameter sets, instead of all computing the same stuff over and over again. so, maybe this functionality of enforcing the same hyperparameter combinations on different agents is not even there?
one solution would be to enforce method=grid, but i would also be able to use for instance random search. another solution would be to somehow make the k_fold of the CV a hyperparameter, but then, with methods other than grid, again, i have no guarantee that all k_fold values will be tested.
it would be so cool to have a functionality to enforce the same random seed with redundant computation on agents that write to the same sweep.

Hi Jan-Christian,

Thank you for reaching out. I’ll be glad to assist you with this. We’ll investigate this and will be reaching back to you for updates.

Regards,
Carlo Argel

Hi @jcnumeus , please see this example of how to perform traditional k-fold cross validation using sweeps, which is the current support approach.

You are correct in that you can’t force multiple/same agents to reproduce the same set of hyperparemeter sequences. Our internal sweep controller lines up hyperparmeters to test and feeds them to the agent(s). There is an open feature for setting random sweep seeds, however, this isn’t being considered at this time with our sweeps team. If this changes we’ll be sure to make an announcement. Please let us know if you have any other questions.

Hello,

thanks for your replies! So, in that CV example, it looks similar to what im currently doing. I dont fully understand whether it executes the same parameter sets for each fold. That would be exactly what i need to do. Im thinking, couldn’t this also be done by calling wandb.config a number of times (i.e. the number of grid points i want to search) beforehand and appending all those configs as dicts to a queue? Then, i could always send the same config to the tasks running for different folds and overwrite the config at the end of the run with wandb.config.update. Something along the lines of:

# task on remote machine
def remote_task(configs,k):
 
  def train(config,k):
     with wandb.init():
       config = configs.pop()
       # load data for fold k 
       # train model 
       config['k'] = k 
       wandb.config.update(config)

  wandb.agent(sweep_id, train, count=len(configs))

def main():
  # before starting the remote workers... 
  sweep_id = wandb.sweep(sweep_config)
  configs = []
  def get_config():
    configs.append(wandb.config)

  # this should populate my list of configs 
  wandb.agent(sweep_id, get_config, count=50)

  # start up remote tasks: one task per k fold that trains all configs sequentially
  for k in range(k_folds):
    remote.map(remote_task, (configs, k))

wouldnt that in a hacky way enforce running the same beforehand generated configs across each fold? i could even add ‘k’:[None] to my sweep_config, such that it appears in the sweep parameter plots. and then, due to the config overwrite it should be correctly displayed as an integer in the plot?
not sure whether i’m missing anything, here. maybe some of the stats & plots on the wandb dashboard get messed up if i do it like that?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.