Hi,
I’ve been through the docs and other forum entries but cannot find a solution, apologies if I missed something.
TLDR: It seems that when an agent
is resuming a run, it passes additional arguments to the python script that don’t follow the ${args_no_hyphens}
rule specified in the sweep config, thereby crashing the script relying on hydra
.
I have a script that uses hydra
for the default configuration. I want to use wandb for sweeps which should ultimately run on SLURM, hence I need robustness to preemption.
Currently I’m trying to get it working locally.
I do the following:
- Start a sweep (grid search with 6 runs). For compatibility with
hydra
I specifycommand: [${env}, python, ${program}, ${args_no_hyphens}]
- Start an agent
- Preempt one run (using
wandb.mark_preempting()
) and kill it usingkill -SIGKILL <pid>
. The run shows up as “preempted” (not “preempting”!) on the webpage.
As expected, the agent queues and resumes the run (weirdly at the end of the queue, not the front). However, when trying to resume the run, it doesn’t launch it as initially with args passed without hyphens, but with a whole lot of additional info.
For comparison, a new run looks like this:
2024-04-17 17:49:26,695 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py experiment.target=3 string=bla
And a resumed run looks like this:
2024-04-17 18:01:29,939 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python wandb_main.py "_wandb={'cli_version': '0.16.6', 'framework': 'huggingface', 'is_jupyter_run': False, 'is_kaggle_kernel': False, 'python_version': '3.11.7', 'start_time': 1713368839, 't': {'1': [1, 50, 55], '13': 'linux-x86_64', '2': [1, 11, 12, 49, 50, 55], '3': [1, 3, 5, 16, 23], '4': '3.11.7', '5': '0.16.6', '8': [5]}}" "experiment={'input': 1, 'target': 1}" experiment.target=1 string=blub "wandb={'entity': 'migl', 'project': 'migl_test_project'}"
thereby crashing hydra
which doesn’t know how to deal with this.
My questions:
- Is there a workaround?
- What is all this information used for? As a workaround, could I try to remove it before getting the args parsed by
hydra
?
Thank you for your help!