I have used wandb for a while now and since this is my first post here I must begin by extending my thanks to the team for making a great product! To my question:
I have just started using Bayesian sweeps instead of grid based. I have a question about the parameter importance chart that you may add. If I understand it correctly, the bayesian sweep runs a couple of runs to build a probability model for e.g. the validation loss based on tweeking the given parameters inside their span. So in other words if permitted to run for say 100 runs, there will be many more runs inside the parameter-span where the probability of getting a low validation loss is highest than the opposite. This also means that some parameter configurations will be vastly overrepresented.
Does it matter that there is a big inbalance in the tested parameter configurations when making the “Parameter Importance” chart/panel? I don’t know much about how the RF importance parameter is achieved, but for the Correlation it seems to me that this would be affected. Now I don’t if that would be a bad thing, since I guess it should reflect the probability model but I would just like to hear someone else’s opinion on the matter
Thanks for your question. The Parameter Importance plot essentially predicts how “reliably” we can predict the output metric given the hyperparameters.
Essentially, the importance metric represents how cleanly a given input hyperparameter was able to split a tree to produce 2 output classes, and as a result, how much information was gained due to that one hyperparameter. The more information gained, the more higher the importance of that one input parameter is.
I would recommend looking up “Gini importance” and “Gini impurity” if you are interested in diving deeper into this.
Coming back to your question about the relation between number of runs and importance, importance is a statistical measure. The more runs you create, the less variance you will see with importance and the closer they will be to the true importance of the metric.
If there is a large imbalance in the input hyperparameters, this could definitely skew the importance plot. There should not be a large difference, however since the Bayes’ search establishes a good balance between exploring new combinations / using past combinations.
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Hi Styrbjörn, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!