Hi,
I’m using Weights & Biases Sweeps to run and track hyperparameter optimization for an XGBoost model.
I’ve defined the sweep configuration like so,
sweep_configs = {
"method": "random",
"metric": {"name": "Mean_RMSE_score", "goal": "minimize"},
"parameters": {
'max_depth': {"values": [4, 9, 14, 19, 29, 34, 44, 49, 54, 59]},
'learning_rate': {"values": [0.05, 0.0697, 0.0973, 0.1357, 0.1893, 0.2641, 0.3684, 0.5139, 0.7169, 1.0]},
'objective': {"values": ['reg:squarederror']},
'reg_alpha': {"values": [0.001, 0.0027, 0.0046,0.0077]},
'reg_lambda': {"values": [0.0010, 0.0046, 0.0129, 0.0215, 0.0359, 0.0599, 0.1]},
'min_child_weight': {"values": [0.1802, 0.3246, 0.5848, 1.0536, 3.42, 6.1616, 11.1009, 20.0]},
'random_state': {"values": [46]},
}
}
For each iteration in the random search process, I perform K-fold cross-validation and log the mean RMSE score to the run using
wandb.log({"Mean_RMSE_score": score})
I then use sweep.best_run()
to fetch the run (& hyperparam config) with the lowest mean RMSE score. However, the best_run()
method does not return the run with the lowest RMSE score. In fact, it is returning a run, that has the 5th lowest RMSE score. Below is a screenshot of some runs in one of my sweeps.
As a result, I’m not getting the best set of hyperparameters to build the optimum model.
Here is the code where I access the sweep by its sweep ID and use best_run()
def get_best_run(sweep_id):
api = wandb.Api()
sweep = api.sweep(f"{SETTINGS['WANDB_ENTITY']}/{SETTINGS['WANDB_PROJECT']}/{sweep_id}")
best_run = sweep.best_run()
run_id = best_run.id
with init_wandb_run(name = "best_hpo_run",
group = "train",
job_type = "hpo",
run_id = run_id,
resume = "must"
) as run:
best_config = dict(run.config)
return best_config
What could be going wrong here?