How to run wandb.sweep in Offline mode

pparv056 · June 2, 2024, 11:22pm

Hi,

I need to run my code on a cluster in offline mode. Even though I specify wandb offline in the batch file, I encounter an error when the code reaches wandb.sweep(). The error message is:

wandb: Network error (ConnectionError), entering retry loop.

How can I make wandb.sweep work offline? Below is my code:

@click.command()
@click.argument("sweep_config")
@click.argument("train_file")
@click.argument("project_name")
@click.argument("entity")
def main(sweep_config, train_file, project_name, entity):
    wandb.init(project=project_name, mode='offline')
    
    with open(sweep_config) as file:
        config_dict = yaml.load(file, Loader=yaml.FullLoader)
    config_dict['program'] = train_file
    
    num = count_hyperparams(config_dict)
    print("Number of hyperparameter configurations: %s" % num)

    sweep_id = wandb.sweep(config_dict, project=project_name, entity=entity)
    
    # sbatch --export=A=5,b='test' jobscript.sbatch
    for _ in range(num):
        subprocess.Popen(['sbatch',
            '--export=sweep_id=%s,project_name=%s,entity=%s' % (sweep_id, project_name, entity),
            'slurm_run/wandb_agent.sbatch'
        ]) 

if __name__ == '__main__':
    main()

system · June 2, 2024, 11:25pm

Hello, This a reply from our support bot designed to assist you with your Weights & Biases related queries. To reach a human please reply to this message.

Running wandb.sweep in offline mode presents a unique challenge because sweeps are designed to coordinate hyperparameter searches across multiple runs, typically requiring communication with the Weights & Biases server. However, based on the context provided, there’s no direct support for executing wandb.sweep entirely offline as it inherently relies on the W&B server to manage the sweep configuration and distribute tasks to agents.

The error you’re encountering (wandb: Network error (ConnectionError), entering retry loop.) occurs because wandb.sweep attempts to communicate with the W&B server to create the sweep and fails due to the offline mode setting.

Here’s a workaround approach for running experiments in a somewhat “offline” manner, but it’s important to note that this doesn’t utilize the wandb.sweep function directly because of its online nature:

Prepare Sweep Configuration Locally: Before moving to the cluster, define your sweep configuration as you normally would and use the W&B interface to create a sweep. This step requires internet access but can be done from your local machine. Note the sweep_id generated by W&B.
Manual Sweep Execution: Instead of using wandb.sweep and wandb.agent to automatically manage your sweep, manually iterate over your desired hyperparameter configurations in your code. For each configuration:
- Initialize a W&B run with wandb.init(mode='offline', ...) to ensure it operates in offline mode.
- Manually set the hyperparameters in wandb.config based on the current iteration’s configuration.
- Execute your training script with these hyperparameters.
- Save the run’s outputs and logs as usual with wandb.log.
Post-Processing: After completing all runs, you’ll have a collection of offline run directories. You can then upload these runs to the W&B server when internet access is available using wandb sync command. This step won’t recreate the sweep on the server but will upload the runs to the specified project.

This approach requires manual management of hyperparameter configurations and doesn’t leverage the full power of W&B sweeps, such as automatic hyperparameter optimization algorithms (e.g., Bayesian optimization). However, it allows you to conduct multiple runs in an offline environment and later sync them to W&B for analysis.

For your specific use case, consider revising the workflow to avoid calling wandb.sweep in an environment where network access is not available. Instead, focus on preparing your experiments beforehand and managing the hyperparameter iterations manually within your script.

Unfortunately, the provided context does not include direct instructions or support for running wandb.sweep in a completely offline mode, reflecting the inherent design of sweeps as an online feature. For more detailed guidance or updates on offline capabilities, I recommend reaching out to the Weights & Biases support or checking the W&B Community forums for any new developments or workarounds shared by the community.

To reach a human please reply to this message.

-WandBot

jason-arkens17 · June 11, 2024, 4:55pm

Hi Payam,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

jason-arkens17 · June 12, 2024, 11:14pm

Hi Payam, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

kinzhang · July 26, 2024, 8:13pm

I found that here is another post from two years ago: https://community.wandb.ai/t/how-to-run-wandb-sweep-in-offline-mode/6829

I checked this doc: Search and stop algorithms locally with W&B agents

But as same to @pparv056, I’m using the python script and without wandb.controller. But here is my sweep.py file:

sweep_config = {
    'method': 'bayes',
    'name': 'deflow_lr',
    'metric': {
        'goal': 'minimize',
        'name': 'val/EPE_FD'
    },
    'parameters': {
        'lr': {'max': 1e-2, 'min': 1e-6},
    }
}
sweep_id=wandb.sweep(sweep_config, project="lr_sweep")
wandb.agent(sweep_id=sweep_id, function=main, count=10)

Where can I set the controller to local?

Topic		Replies	Views
100% offline sweep W&B Help sweeps , wandb	15	3149	July 6, 2023
Agent not exiting in offline mode W&B Help sweeps	3	67	August 14, 2024
Broken Pipe error W&B Help sweeps , wandb	2	1760	February 9, 2024
Local controller seems block W&B Help sweeps	6	744	October 31, 2022
Sweep on remote cluster GPUs W&B Help sweeps	5	1234	September 18, 2022

How to run wandb.sweep in Offline mode

Related topics