Hi,
i tried several methods to handle imbalanced datasets
i used a simple single neuron ANN as a logistic regression model and a churn dataset
under- and oversampling worked well (especially SMOTE) but to get deeper understanding i wonder if there are even simpler ways to do than altering the original dataset?
my questions (sorry, too many of them):
-
is changing the threshold value after training a model a usual and proper/professional way to handle imbalance dataset classification problem?
- if so, how to do it? just run over the trained model and test data modifying threshold and calculating f1?
is the best threshold where F1 score is the highest?
- if so, how to do it? just run over the trained model and test data modifying threshold and calculating f1?
-
is it a good idea to try to find a model with best AUC score using wandb sweeps and then find the best threshold value of that model maximizing the F1 score?
-
what if i train a model with a threshold value other than 0.5
- doing wandb sweeps finding the best AUC or f1 score?
-
applying class weights will help to improve recall but in return precision will decrease
- how to make it right? how to maximize F1 score?
-
does it improve my model if i add one or more hidden layers to it?
thank you!