Wrong cmd line args for hydra when agent resums preempted runs in a sweep

Hi,

I’ve been through the docs and other forum entries but cannot find a solution, apologies if I missed something.

TLDR: It seems that when an agent is resuming a run, it passes additional arguments to the python script that don’t follow the ${args_no_hyphens} rule specified in the sweep config, thereby crashing the script relying on hydra.

I have a script that uses hydra for the default configuration. I want to use wandb for sweeps which should ultimately run on SLURM, hence I need robustness to preemption.
Currently I’m trying to get it working locally.

I do the following:

  • Start a sweep (grid search with 6 runs). For compatibility with hydra I specify command: [${env}, python, ${program}, ${args_no_hyphens}]
  • Start an agent
  • Preempt one run (using wandb.mark_preempting()) and kill it using kill -SIGKILL <pid>. The run shows up as “preempted” (not “preempting”!) on the webpage.

As expected, the agent queues and resumes the run (weirdly at the end of the queue, not the front). However, when trying to resume the run, it doesn’t launch it as initially with args passed without hyphens, but with a whole lot of additional info.
For comparison, a new run looks like this:

2024-04-17 17:49:26,695 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py experiment.target=3 string=bla

And a resumed run looks like this:

2024-04-17 18:01:29,939 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py "_wandb={'cli_version': '0.16.6', 'framework': 'huggingface', 'is_jupyter_run': False, 'is_kaggle_kernel': False, 'python_version': '3.11.7', 'start_time': 1713368839, 't': {'1': [1, 50, 55], '13': 'linux-x86_64', '2': [1, 11, 12, 49, 50, 55], '3': [1, 3, 5, 16, 23], '4': '3.11.7', '5': '0.16.6', '8': [5]}}" "experiment={'input': 1, 'target': 1}" experiment.target=1 string=blub "wandb={'entity': 'migl', 'project': 'migl_test_project'}"

thereby crashing hydra which doesn’t know how to deal with this.

My questions:

  • Is there a workaround?
  • What is all this information used for? As a workaround, could I try to remove it before getting the args parsed by hydra?

Thank you for your help!

Hi @migl! Thank you for reaching out!

I apologize for the issue you’re experiencing. We’ve recently received similar feedback from other users and are now treating these extra arguments as a bug.

  • What is all this information used for? As a workaround, could I try to remove it before getting the args parsed by hydra?

You’re absolutely right. The current suggested workaround involves ignoring unknown arguments. You can do this in argparse with args, _ = parser.parse_known_args(). This should allow the program to run smoothly when you manually pass it the command that W&B generated, and then filter the input before it enters Hydra.

Warmly,
Artsiom

Thank you for your reply, removing the additional arguments seems to work.

However, just in case someone else runs into this, I wasn’t able to make use of the proposed solution using argparse as hydra uses positional arguments to catch the overrides (see here) and hence all arguments are treated as known for argparse, just the hydra-internal parsing later fails.

Instead, I did something super hacky in the meantime, hoping this issue will be fixed in wandb soon:

    new_cmd_line_args = []

    for arg in sys.argv:
        # Remove all 'dictionary' arguments
        if "{" in arg and ":" in arg:
            continue
        new_cmd_line_args.append(arg)
    sys.argv = new_cmd_line_args

This relies on the fact that hydra-arguments (at least in my case so far, ymmv) don’t include dictionaries of the style "_wandb={'cli_version': '0.16.6', 'framework': 'huggingface', ...}".

Thank you so much for providing your personal workaround! This is great and from now on, I’ll suggest your solution in case someone boots up preempted runs with hydra.

I will follow up in here in case we have any more questions regarding this thread. I will also let you know once the concern is fixed.

Cheers!
Artsiom

Hey folks. This is no doubt an increasingly popular issue as hydra and wandb gain popularity.
@artsiom , this may also help illuminate a solution. Hydra’s main gripe is with the quotations.
See here, for how they like the formatting.

In my own debugging I found that the wandb agent log mentions:

About to run command: /usr/bin/env python main.py "finetune={'batch_size': 48, 'lr': 0.08840240823868173}"

which hydra complains at. But removing the inner quotations on the keys I can run the command successfully (outside of an agent):

python main.py --config-name=mnist "finetune={batch_size: 48, lr: 0.08840240823868173}"

Then, for wandb sweep and wandb agent, I modify the workaround above to selectively eliminate single quotes:

    new_cmd_line_args = []
    for arg in sys.argv:
        # Try and catch the wandb agent formatted args
        if "={" in arg:
            arg = arg.replace("'", "")
        new_cmd_line_args.append(arg)
    sys.argv = new_cmd_line_args

Cheers,
Michael

Thank you so much for the follow up @mwalters! This is great.

I have also just been notified that our engineering team removed the _wandb argument when preemting runs and you guys should not be seeing this behavior in the next version of wandb.

Thank you so much for your patience regarding this.