Best practices for many quick runs?

I have a project where I am doing many, many runs across seeds none of which take a particularly long amount of time and for all of which I would like to log metrics (both the individual run metrics and the group metrics are relevant for me). Unfortunately my compute environment is such that I must run W&B in offline mode (compute nodes are not connected to the internet), and as a result I have found sync to be an extreme bottleneck in my work. Has anyone encountered this kind of issue before and come up with a way to deal with it?

Hi @evanv , thanks for writing in. We’re looking into this for you.

Hey @evanv,

I’m sorry to hear that the offline to online sync for wandb has been acting as a bottleneck for you. Would you be able to share more context on what commands / a minimal example to reproduce the issues you’ve been running into?

In the meantime have you tried adjusting the arguments to wandb sync (wandb sync - Documentation) to send batches of runs in at a time? Using glob patterns or cleaning you can reduce the load needed when syncing all runs at once

Hi @a-sh0ts , sure thing!
A minimal example for what I am doing looks like this:

def do_hyperparam_search():
    configs = [
        {'lr': lr, 'lambda': lmbda, 'model_type': m}
        for lr in [1, 0.1, 0.001]
        for lmbda in [1, 10, 100]
        for model_type in ['foo', 'bar']
    ]
    for config in configs:
        run_config(config)


def run_config(config):
    for seed in range(100):
        do_wandb_run(seed, config)

def do_wandb_run(seed, config):
    torch.manual_seed(seed)
    wandb.init(config=config, mode='offline')
    for epoch in range(100):
        for cls in range(5):
            wandb.log({'some_metric_for_class': value}, step=epoch)
    wandb.finish()

I have a particular algorithm I am testing which is only sometimes convergent, so it is necessary to run it over many different seed and view both the average behavior and the variance in this behavior. The issue I am running into is that doing even just one run with a particular config produces hundreds of runs, and testing over different hyperparameter configurations multiplies the issue. Unfortunately my compute nodes are all offline so I have to manually synch all my runs. Although I have been using wandb sync --sync-all, when I have thousands of runs that also becomes untenable. Is there a better way I can run these experiments?

Hi @evanv!
You could try using W&B Sweeps and use the bayes/random search strategies. In these cases, you wouldn’t be searching the entire space but you would get a good picture of the search landscape without needing to. That would then reduce the number of runs you’d have to sync with wandb.

Hey @_scott , thanks so much for the advice! I have been wanting to integrate Sweeps in for a while but was not quite clear how they would work–if I create a sweep will it only require me to synch one file (for the sweep) or all the runs associated with the sleep. The primary multiplicity in my code comes from the fact that I have to run 100 seeds per configuration.

You’ll still have to sync your runs if you want to view them in W&B.

The primary multiplicity in my code comes from the fact that I have to run 100 seeds per configuration.

Wow, that’s a lot of seeds. I’ll move this back into “W&B Best Practises” and hopefully someone in the community has seen this and can give some advice. You could also try wandb local which would mean you have your own self-hosted W&B, this would be a require a bit of upfront time investment but would likely speed up that syncing bottleneck.

2 Likes

Got it. Thanks for the advice. I will look into wandb local to see whether it is tenable to set it up on my system. If anyone else has advice it is welcomed too :slight_smile:

1 Like

Hey @evanv!

@_scott 's advice using wandb sweepsand/or wandb local should have hopefully helped with some of your issues with logging/syncing large volumes of runs in an offline setting. The engineering team is aware of the problems you’re facing and would love to hear suggestions on behavior around this!

Hey @a-sh0ts , thanks so much for making the engineering team aware! I have been thinking a bit about what optimal functionality would be for me. On a day to day, I mostly care about some kind of statistic related to the seeds I am collecting (mean, median, max, etc…) as well as standard errors of this statistic across runs. Perhaps setting up functionality to store only these statistics, rather than all the information across all runs, would allow for efficient storage and processing of the data?