Hi, is there a way of only using a set of params if another param is used in a sweep?
For example, in python I want to run sklearn’s DBSCAN and HDBSCAN as part of the sweep, and then for whichever clustering algorithm is being used in that particular run, also use related parameters for that algorithm.
E.g. I have this config:
config = {
'method': 'bayes',
'metric': {
'goal': 'maximize',
'name': 'composite_score'
},
'parameters': {
...
'clustering_algorithm': {
'parameters': {
'hdbscan': {
'parameters': {
'min_cluster_size': {'values': [2, 3, 4, 5, 10]},
'alpha': {'max': 5.0, 'min': 0.000001},
'algorithm': {'values': ['auto']},
'n_jobs': {'values': [-1, 1]},
'cluster_selection_method': {'values': ['eom', 'leaf']},
'min_samples': {'values': [1, 2, 3, 4, 5, 10]} # 1 because only ever one incoming neighbour
}
},
'dbscan': {
'parameters': {
'eps': {'max': 1.0, 'min': 0.0000001},
'min_samples': {
'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70, 80, 100]}
}
}
}
}
}}
However at runtime:
print(self.config[‘clustering_algorithm’])
Gives:
{'dbscan': {'eps': 0.10271607857111162, 'min_samples': 15}, 'hdbscan': {'algorithm': 'auto', 'alpha': 2.0278536865942995, 'cluster_selection_method': 'leaf', 'min_cluster_size': 3, 'min_samples': 5, 'n_jobs': 1}}
So the agent appears to not be making a decision between the two algorithms and instead is just populating all possible parameters.
I’d like to do something like:
if self.config['clustering_algorithm'] == 'hdbscan':
alg = HDBSCAN(**self.config['clustering_algorithm']['hdbscan'])
else:
alg = DBSCAN(**self.config['clustering_algorithm']['dbscan'])
My hope / intention is to be able to assess the performance of multiple clustering algorithms, but not have to do any manual switching - is it possible to do something like this?