Sweep, how is the optimisation metric selected in bayesian optimisation

Hi every one,

When using a sweep, the selection of a metric that need to beoptimized is required when using bayesian optimisation.

I wanted to know if for the selection of the next critierion for the next runs, the bayesian optimisation is based on the value of that metric at the end of the run (last epoch) or on the highest value reached by the metric during the run ?

Because in the first case, if I choose a metric calculated over my validation dataset, it performance may decrease during the training because of overfitting, then the value of my metric at the end would not reflect the best performance of my model.

Thanks for your help

Hi @felix_quinton , please visit this detailed article on the specifics of how Bayesian optimization works. If you still have any questions please let us know.

Hi,

Thanks for your time, i read this article but it doesn’t seems to answer my question, I might have been unclear.

Lets take an example, :

I want to found the best hyperparameter configuration for my model over 10 runs with bayesian search.
I choose the accuracy over my validation dataset as a metric to maximise. I train all my runs over 1000 epochs.

For the run 1:

  • The run achieve it’s best value of accuracy over validation dataset at the epoch 700, with a value of 0.60
  • After that the run start to overfit and the value of accuracy over validation decrease to 0.40 at epoch 1000

For the run 2:

  • The run achieve it’s best value of accuracy over validation dataset at the epoch 700, with a value of 0.50
  • After that the run start to overfit and the value of accuracy over validation decrease to 0.45 at epoch 1000

In my eyes, I would considered the run 1 as a better run since this configuration as reached the highest results with a maximum score of 0.60 compare to 0.50 for the run 2.

But the process seems to only care of the value reached at the end of the training to consider the quality of the run. Meaning that, in this case, the run 2 would appears as a better run with a score of 0.45 compared to 0.40 for the run 1. And so the hyperparameter combination of run 2 would be have more influence than the one of run 1 in the bayesian optimisation process.

Am I right ?

Thanks.