Hi,
i want to make a multiclass classifier using a bert model. For this i would like to compare the performance of (at least) two domain specific bert models. But before i compare the model performance i would like to find the best hyperparameters using wandb sweeps und the simpletransformers api (the simpletransformers api, has an easy integration with wandb).
Currently i’m a bit confused how to select a good dataset for
- the hyperparameter optimization
- the training with the best hyperparams.
So for the hyperparams, should i create n cross-validation sets and then run a training cycle with the current selected hyperparams for every m in n dataset?
E.g. i created 2 train/test sets and i only want to find the best n of episodes out of [1,2]:
For both train/test sets, the training is done for 1 episode and in the nex cycle for 2 episodes?
And if i found the best hyperparameters, should i train the final model afterwards using my full dataset?
Hope my questions are kind of clear