Hello,
I’m just starting to experiment with WandB and I’m the most interested in its hyperparameter optimization features.
I’m trying their HyperBand algorithm and testing it out with this example repo
I reduce the number of epochs to 3 so each full run is approximately 1 minute
I’ve tried the configuration given in the repo with the hyperband parameters being
early_terminate:
type: hyperband
s: 2
eta: 2
max_iter: 8
Following the documentation, i’m expecting the algorithm to check if the run should be stopped at steps [8/2/2, 8/2] = [2, 4]
However every runs are continuing until the last epochs. I noticed 2 exceptions.
- Keyboard interrupts : 2 of the runs were killed because of a keyboard interrupt that I definitely didn’t send ? Is it how Hyperband stops the run ?
- One of the run was stopped i.e the log says
2024-04-15 22:42:02,644 - wandb.wandb_agent - INFO - Agent received command: stop
2024-04-15 22:42:02,645 - wandb.wandb_agent - INFO - Stop: csxr2e0f
2024-04-15 22:42:07,651 - wandb.wandb_agent - INFO - Cleaning up finished run: csxr2e0f
2024-04-15 22:42:07,847 - wandb.wandb_agent - INFO - Agent received command: run
...etc
It only happened once, is this how hyperband stops the runs ? Why is the run still shozing as running in the WandB UI, is that a bug ?
So to sum up my questions are:
- How is Hyperband supposed to behave in this example use case ?
- How are runs stopped by Hyperband supposed to show up in the logs, what are the expected logs I should read ?
- What does my hyperparameters means in my context ? i.e at which step/epoch should the runs be stopped and how many runs ? Bcs the results I see do not correlate with what the docs or paper say…
I tried to share my dashboard but I couldn’t find any way to do so, please let me know how to share my project, fyi my account is a student one so i’m on the free usage plan
Thanks a lot