Control knobs for sending commands back to the running job / controlling live variables from the das

turian · January 18, 2023, 12:57pm

Feature

There should be a way to attach variables to the logger, that you can modify live from from controls in the dashboard.

Motivation

When you have a running job and are monitoring the progress, you sometimes want to adjust the learning rate or other hyperparameter (should we switch to fine-tuning mode, etc.).

Pitch

This is a bit of a unspoken black-magic deep learning technique. However, if you read papers from Meta, etc. or talk to hardcore old-school practitioners, they have these super long-running difficult optimization problems, and say something like: “Well we trained the generator for X thousand epochs, then we enabled the discriminator, then Y thousand epochs later we dropped the learning rate, etc.” This is ideally done by monitoring a live, running job and modifying the variables in situ.

Alternatives

The non-agile way to do this is let your run go for a while, decide afterwards that you should have changed something at some point in time, code that, run it again and cross your fingers. This is obviously pretty slow and requires luck.
A hacky way to do this is to create a DSL with sentinel files that the running job reads and applies. However, the workflow is useful enough that there should be a common way to do this.

Additional context

I’m not aware of any logging library that does this. So it would make great blog posts to show off and attract more users.

nathank · January 27, 2023, 8:33pm

Hi @turian, thank you for the feature request as well as the use-case this would unlock! I will go ahead and submit this to engineering team and follow up once they have a chance to look into this.

Thank you,
Nate

turian · January 27, 2023, 8:50pm

@nathank Thanks. Here is another simple example, which would make a nice demonstration for a blog post:

I was recently doing a randomized grid search, running 8 jobs simultaneously. After looking at a few training runs, it was clear that any model that did not achieve loss of 0.1 by batch 1000 should be stopped and restarted with new hyperparameters. So this is something that would be useful to control from the dashboard.

(Another similar example is that I would then go manually adjust the grid script by hand, to remove learning rates that were too high or too low. I would prefer to do that from the dashboard.)

system · March 28, 2023, 8:51pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Run to Run Logging W&B Help dashboard , wandb	11	1121	January 23, 2023
Adding media to a finished run W&B Help wandb	4	188	July 17, 2023
WandB syncing to local server? W&B Help	3	115	March 25, 2024
Add parameter column to existing Runs W&B Help wandb	3	317	February 18, 2024
Adding values manually to run W&B Help dashboard , wandb , beginner-friendly	4	916	June 26, 2022