Wrong cmd line args for hydra when agent resums preempted runs in a sweep

migl · April 17, 2024, 4:02pm

Hi,

I’ve been through the docs and other forum entries but cannot find a solution, apologies if I missed something.

TLDR: It seems that when an agent is resuming a run, it passes additional arguments to the python script that don’t follow the ${args_no_hyphens} rule specified in the sweep config, thereby crashing the script relying on hydra.

I have a script that uses hydra for the default configuration. I want to use wandb for sweeps which should ultimately run on SLURM, hence I need robustness to preemption.
Currently I’m trying to get it working locally.

I do the following:

Start a sweep (grid search with 6 runs). For compatibility with hydra I specify command: [${env}, python, ${program}, ${args_no_hyphens}]
Start an agent
Preempt one run (using wandb.mark_preempting()) and kill it using kill -SIGKILL <pid>. The run shows up as “preempted” (not “preempting”!) on the webpage.

As expected, the agent queues and resumes the run (weirdly at the end of the queue, not the front). However, when trying to resume the run, it doesn’t launch it as initially with args passed without hyphens, but with a whole lot of additional info.
For comparison, a new run looks like this:

2024-04-17 17:49:26,695 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py experiment.target=3 string=bla

And a resumed run looks like this:

2024-04-17 18:01:29,939 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py "_wandb={'cli_version': '0.16.6', 'framework': 'huggingface', 'is_jupyter_run': False, 'is_kaggle_kernel': False, 'python_version': '3.11.7', 'start_time': 1713368839, 't': {'1': [1, 50, 55], '13': 'linux-x86_64', '2': [1, 11, 12, 49, 50, 55], '3': [1, 3, 5, 16, 23], '4': '3.11.7', '5': '0.16.6', '8': [5]}}" "experiment={'input': 1, 'target': 1}" experiment.target=1 string=blub "wandb={'entity': 'migl', 'project': 'migl_test_project'}"

thereby crashing hydra which doesn’t know how to deal with this.

My questions:

Is there a workaround?
What is all this information used for? As a workaround, could I try to remove it before getting the args parsed by hydra?

Thank you for your help!

artsiom · April 17, 2024, 8:33pm

Hi @migl! Thank you for reaching out!

I apologize for the issue you’re experiencing. We’ve recently received similar feedback from other users and are now treating these extra arguments as a bug.

What is all this information used for? As a workaround, could I try to remove it before getting the args parsed by hydra?

You’re absolutely right. The current suggested workaround involves ignoring unknown arguments. You can do this in argparse with args, _ = parser.parse_known_args(). This should allow the program to run smoothly when you manually pass it the command that W&B generated, and then filter the input before it enters Hydra.

Warmly,
Artsiom

migl · April 18, 2024, 9:04am

Thank you for your reply, removing the additional arguments seems to work.

However, just in case someone else runs into this, I wasn’t able to make use of the proposed solution using argparse as hydra uses positional arguments to catch the overrides (see here) and hence all arguments are treated as known for argparse, just the hydra-internal parsing later fails.

Instead, I did something super hacky in the meantime, hoping this issue will be fixed in wandb soon:

    new_cmd_line_args = []

    for arg in sys.argv:
        # Remove all 'dictionary' arguments
        if "{" in arg and ":" in arg:
            continue
        new_cmd_line_args.append(arg)
    sys.argv = new_cmd_line_args

This relies on the fact that hydra-arguments (at least in my case so far, ymmv) don’t include dictionaries of the style "_wandb={'cli_version': '0.16.6', 'framework': 'huggingface', ...}".

artsiom · April 18, 2024, 3:28pm

Thank you so much for providing your personal workaround! This is great and from now on, I’ll suggest your solution in case someone boots up preempted runs with hydra.

I will follow up in here in case we have any more questions regarding this thread. I will also let you know once the concern is fixed.

Cheers!
Artsiom

mwalters · April 30, 2024, 6:47pm

Hey folks. This is no doubt an increasingly popular issue as hydra and wandb gain popularity.
@artsiom , this may also help illuminate a solution. Hydra’s main gripe is with the quotations.
See here, for how they like the formatting.

In my own debugging I found that the wandb agent log mentions:

About to run command: /usr/bin/env python main.py "finetune={'batch_size': 48, 'lr': 0.08840240823868173}"

which hydra complains at. But removing the inner quotations on the keys I can run the command successfully (outside of an agent):

python main.py --config-name=mnist "finetune={batch_size: 48, lr: 0.08840240823868173}"

Then, for wandb sweep and wandb agent, I modify the workaround above to selectively eliminate single quotes:

    new_cmd_line_args = []
    for arg in sys.argv:
        # Try and catch the wandb agent formatted args
        if "={" in arg:
            arg = arg.replace("'", "")
        new_cmd_line_args.append(arg)
    sys.argv = new_cmd_line_args

Cheers,
Michael

artsiom · June 20, 2024, 5:04pm

Thank you so much for the follow up @mwalters! This is great.

I have also just been notified that our engineering team removed the _wandb argument when preemting runs and you guys should not be seeing this behavior in the next version of wandb.

Thank you so much for your patience regarding this.

tbartley · March 7, 2025, 1:53am

Hey y’all, this issue is still going on. What was the fix that engineering did that was supposed to clear this up?

Topic		Replies	Views
Programmatically running Sweeps using Hydra W&B Help	5	955	May 19, 2023
Resume run not working for sweep run W&B Help sweeps , wandb	4	2034	March 18, 2023
Sweep agent will always start another run after finishing (on SLURM) W&B Help sweeps	4	279	July 3, 2024
Wandb sweep & Hydra W&B Help sweeps , wandb	3	678	March 6, 2023
Forcing Pre-emption in a sweep W&B Help sweeps	5	1103	January 22, 2023

Wrong cmd line args for hydra when agent resums preempted runs in a sweep

Related topics