Generating only a local concrete set of values for a sweep locally without logging remotely?

I want to have a single sweep config and NOT log to the plataform for debugging. I already specified the sweep config so I want to use wandb’s functionality to create a specific set of hps from this for a sebug run. How to do this?

related: What is the official way to run a wandb sweep with hugging face (HF) transformers?

Hi @brando thanks for writing in! It sounds for this testing purpose that you would want to use our sweeps package, to mock the sweep server. You can install as pip install sweeps. A minimal code example to get the runs, and their config is the following:

from sweeps import next_run, stop_runs, SweepRun, next_runs

sweep_config = {
        'method': 'grid',
        'parameters': {'a': {'values': [1, 2, 3]}}
}

suggested_run = next_runs(sweep_config, [], n = 3)
display(suggested_run)

for sr in suggested_run:
  print(sr.config)

You can find more examples in the sweeps repo. Would this work for your use case?

Hi @brando just checking in here to see if you’ve tried the above suggestion, and if that would work for your use case? Thanks!

Hi thanos! I had already found a solution that worked for me (I think) so have not tested your suggestion. I will check it out soon and see how to incorporate it to my workflow (if possible). But it does include a new dependency :frowning:

My current solution can be found here: machine learning - What is the official way to run a wandb sweep with hugging face (HF) transformers so that all the HF features work e.g. distributed training? - Stack Overflow

but will copy paste it just in case:

Although it is possible in principle combined with configs I decided against it. The reason are:

  1. Simplicity. It’s better for the code to be simpler so anyone can easily re-use it and the time is spent more on what matters most (ML research)
  2. The code already has arguments passed in the config. They don’t have to be repeated in the args parse. Not only is it redundant but you have to maintain two sets of code, which leads to more bugs less time on research etc.
  3. If I want to debug then don’t log to wandb at all and call the train script. That is simple. For the code to be consistent load some debug config file (which is a disadvantage because I need to mantain two config files, but given wandb does not have a feature to create a concrete debug config for now this is ok but it has been requested: Generating only a local concrete set of values for a sweep locally without logging remotely?)

So the code with some demo can be found here:

Most crucial code due to links maybe dying:

from argparse import Namespace
from pathlib import Path
from typing import Union

import wandb
import yaml
from wandb.sdk.lib import RunDisabled
from wandb.sdk.wandb_run import Run

import uutils

from pdb import set_trace as st


# def dict_to_namespace(data: dict):
#     if isinstance(data, dict):
#         return Namespace(**{k: dict_to_namespace(v) for k, v in data.items()})
#     elif isinstance(data, list):
#         return [dict_to_namespace(v) for v in data]
#     else:
#         return data

def get_sweep_url_from_run(run: Run) -> str:
    """ https://stackoverflow.com/questions/75852199/how-do-i-print-the-wandb-sweep-url-in-python/76624367#76624367 """
    return run.get_sweep_url()


def get_sweep_url_from_config(sweep_config: dict, sweep_id: str) -> str:
    sweep_url = f"Sweep URL: https://wandb.ai/{sweep_config['entity']}/{sweep_config['project']}/sweeps/{sweep_id}"
    return sweep_url


def get_sweep_url_from_entity_project_sweep_id(entity: str, project: str, sweep_id: str) -> str:
    """

    https://wandb.ai/{username}/{project}/sweeps/{sweep_id}
    """
    api = wandb.Api()
    sweep = api.sweep(f'{entity}/{project}/{sweep_id}')
    return sweep.url


def get_sweep_config(path2sweep_config: str) -> dict:
    """ Get sweep config from path """
    config_path = Path(path2sweep_config).expanduser()
    with open(config_path, 'r') as file:
        sweep_config = yaml.safe_load(file)
    return sweep_config


def exec_run_for_wandb_sweep(path2sweep_config: str,
                             function: callable,
                             ) -> str:  # str but not sure https://chat.openai.com/share/4ef4748c-1796-4c5f-a4b7-be39dfb33cc4
    """
    Run standard sweep from config file. Given correctly set train func., it will run a sweep in the standard way.
    Note, if entity and project are None, then wandb might try to infer them and the call might fail. If you want to
    do a debug mode, set wandb.init(mode='dryrun') else to log to the wandb plataform use 'online' (ref: https://chat.openai.com/share/c5f26f70-37be-4143-95f9-408c92c59669 unverified).
    You need to code the mode in your train file correctly yourself e.g., train = lambda : train(args) or put mode in
    the wandb_config but note that mode is given to init so you'd need to read that field from a file and not from
    wandb.config (since you haven't initialized wandb yet).

    e.g.
        path2sweep_config = '~/ultimate-utils/tutorials_for_myself/my_wandb_uu/my_wandb_sweeps_uu/sweep_in_python_yaml_config/sweep_config.yaml'

    Important remark:
        - run = wandb.init() and run.finish() is run inside the train function.
    """
    # -- 1. Define the sweep configuration in a YAML file and load it in Python as a dict.
    sweep_config: dict = get_sweep_config(path2sweep_config)

    # -- 2. Initialize the sweep in Python which create it on your project/eneity in wandb platform and get the sweep_id.
    sweep_id = wandb.sweep(sweep_config, entity=sweep_config.get('entity'), project=sweep_config.get('project'))
    print(f'wandb sweep url (uutils): {get_sweep_url_from_config(sweep_config, sweep_id)}')

    # -- 3. Finally, once the sweep_id is acquired, execute the sweep using the desired number of agents in python.
    wandb.agent(sweep_id, function=function, count=sweep_config.get('run_cap'))  # train does wandb.init(), run.finish()
    return sweep_id


def setup_wandb_for_train_with_hf_trainer(args: Namespace,
                                          ) -> tuple[wandb.Config, Union[Run, RunDisabled, None]]:
    """
    Set up wandb for the train function that uses hf trainer. If report_to is none then wandb is disabled o.w. if
    report_to is wandb then we set the init to online to log to wandb platform. Always uses config to create the
    run config. It uses wandb.config for a sweep or a debug config (via args.path2debug_config) for report_to none runs.
    """
    report_to = args.report_to
    mode = 'disabled' if report_to == 'none' else 'online'  # no 'dryrun' since wandb logging is already tested by hf
    print(f'{mode=}')
    run: Union[Run, RunDisabled, None] = wandb.init(mode=mode)
    print(f'{run=}')
    # - discover what type of run your doing (no wandb or sweep with wandb)
    print(f'{report_to=}')
    if report_to == 'none':
        # - use debug config from file
        config: wandb.Config = wandb.Config()
        config.update(vars(args))
        config_dict: dict = get_sweep_config(args.path2debug_config)
        config.update(config_dict)
    else:  # then load the debug config
        # https://docs.wandb.ai/ref/python/run?_gl=1*80ki1e*_ga*MTYwMTE3MDYzNS4xNjUyMjI2MTE1*_ga_JH1SJHJQXJ*MTY4ODU5NDI0NS4zMDAuMS4xNjg4NTk1MDg3LjU5LjAuMA..
        print(f'{run.get_sweep_url()=}')
        # - use the sweep config sent from wandb in wandb.config
        config: wandb.Config = wandb.config
        config.update(vars(args))
    return config, run


# - examples & tests

def train_demo(args: Namespace):
    import torch

    # - init run, if report_to is wandb then: 1. sweep use online args merges with sweep config, else report_to is none and wandb is disabled
    config, run = setup_wandb_for_train_with_hf_trainer(args)
    print(f'{config=}')
    uutils.pprint_any_dict(config)

    # Simulate the training process
    num_its = config.get('num_its')  # usually obtained from args or config
    lr = config.get('lr')  # usually obtained from args or config
    train_loss = 8.0 + torch.rand(1).item()
    for i in range(num_its):
        train_loss -= lr * torch.rand(1).item()
        run.log({"lr": lr, "train_loss": train_loss})

    # Finish the current run
    run.finish()


def main_example_run_train_debug_sweep_mode_for_hf_trainer():
    """
python -m pdb -c continue /Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py --report_to none
python -m pdb -c continue /Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py --report_to wandb
    """
    from uutils.hf_uu.hf_argparse.common import get_simple_args

    # - get most basic hf args args
    args: Namespace = get_simple_args()  # just report_to, path2sweep_config, path2debug_seep
    print(args)

    # - run train
    report_to = args.report_to
    if report_to == "none":
        train: callable = train_demo
        train(args)
    elif report_to == "wandb":
        path2sweep_config = args.path2sweep_config
        train = lambda: train_demo(args)
        exec_run_for_wandb_sweep(path2sweep_config, train)
    else:
        raise ValueError(f'Invaid hf report_to option: {report_to=}.')


if __name__ == '__main__':
    import time

    start_time = time.time()
    main_example_run_train_debug_sweep_mode_for_hf_trainer()
    print(f"The main function executed in {time.time() - start_time} seconds.\a")

I think I’d need to think how to incorporate your suggestion to my current approach…not sure yet. Main issue is that I just want this code really:

hp_args = wandb.get_concrete_args_for_debug_from_config(config)

That is really what I need to be honest. To avoid having to mantain two configs. I think I’m happy with:

def main_example_run_train_debug_sweep_mode_for_hf_trainer():
    """
python -m pdb -c continue /Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py --report_to none
python -m pdb -c continue /Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py --report_to wandb
    """
    from uutils.hf_uu.hf_argparse.common import get_simple_args

    # - get most basic hf args args
    args: Namespace = get_simple_args()  # just report_to, path2sweep_config, path2debug_seep
    print(args)

    # - run train
    report_to = args.report_to
    if report_to == "none":
        train: callable = train_demo
        train(args)
    elif report_to == "wandb":
        path2sweep_config = args.path2sweep_config
        train = lambda: train_demo(args)
        exec_run_for_wandb_sweep(path2sweep_config, train)
    else:
        raise ValueError(f'Invaid hf report_to option: {report_to=}.')

Unless you can articulate to me a (good) reason why not to do that. Seems simpler to me? (though it did require me to know by luck the mode=display param in wandb.init, because I wanted some stream line way to make args for all my training scripts).

Let me show you the ideal (pesudo) code for wandb sweeps with debug in my humble opinion of course:

args = argparse_not_complicated_hf_parser_just_normal_simple_argparse()
debug: bool = args.report_to == 'none'

config = load(path2config)
sweep_id = wandb.sweep(config, debug)
train = lambda : train(sweep_id)
wandb.agent(sweep_id, func=train, count=config.run_cap, debug=debug)   # sets count=1 if debug

Now the train function

def train(sweep_id)
    run = wandb.init(sweep_id=sweep_id)
    config = wandb.config # this gets default because it sees sweep_id doesn't exist in wandb plataform so looks locally for where the config was saved and generates some smart default good for debugging 

....
trainer.train()

The advantage of this are:

  1. that there aren’t odd extra if statements to run the sweep vs any other run uniformly.
  2. the train function can be the exact same for both normal vs real sweeps. It’s simpler. Any additional function I run would be the same.

The main issue likely is me passing the sweep_id but perhaps givent the agent is already running that it can set up the wandb’s runtime env so that the train function looks as follow:

def train()
    run = wandb.init()  # decides if mode=disabled vs online vs dryrun on it's own. Like disabled vs online only. 
    config = wandb.config # this gets default because it sees sweep_id doesn't exist in wandb plataform so looks locally for where the config was saved and generates some smart default good for debugging 

....
trainer.train()

Hope this helps make more concrete how things would be easier for wandb users @thanos-wandb

Hi @brando thanks so much for taking the time and writing here your solution for any future reference. I haven’t tested the codes from your github repo yet, is there any issue for any of those where these wouldn’t work for you? Regarding your concern about passing the sweep_id, this shouldn’t be an issue, the lambda function on wandb.agent is a nice hack and should work for you.

Hi @brando since this issue seems resolved for you, I will go ahead and close this ticket now. Feel free though to reopen the conversation here if you had any further issues/questions, and we will be happy to keep investigating!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.