Ring-fencing available parameters based on other categorical params during sweeps

Hi, is there a way of only using a set of params if another param is used in a sweep?

For example, in python I want to run sklearn’s DBSCAN and HDBSCAN as part of the sweep, and then for whichever clustering algorithm is being used in that particular run, also use related parameters for that algorithm.

E.g. I have this config:

config = {
        'method': 'bayes',
        'metric': {
            'goal': 'maximize',
            'name': 'composite_score'
        },
        'parameters': {
            ...
            'clustering_algorithm': {
                'parameters': {
                    'hdbscan': {
                        'parameters': {
                            'min_cluster_size': {'values': [2, 3, 4, 5, 10]},
                            'alpha': {'max': 5.0, 'min': 0.000001},
                            'algorithm': {'values': ['auto']},
                            'n_jobs': {'values': [-1, 1]},
                            'cluster_selection_method': {'values': ['eom', 'leaf']},
                            'min_samples': {'values': [1, 2, 3, 4, 5, 10]}  # 1 because only ever one incoming neighbour
                        }
                    },
                    'dbscan': {
                        'parameters': {
                            'eps': {'max': 1.0, 'min': 0.0000001},
                            'min_samples': {
                                'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70, 80, 100]}
                        }
                    }
                }
            }
        }}

However at runtime:

print(self.config[‘clustering_algorithm’])

Gives:

{'dbscan': {'eps': 0.10271607857111162, 'min_samples': 15}, 'hdbscan': {'algorithm': 'auto', 'alpha': 2.0278536865942995, 'cluster_selection_method': 'leaf', 'min_cluster_size': 3, 'min_samples': 5, 'n_jobs': 1}}

So the agent appears to not be making a decision between the two algorithms and instead is just populating all possible parameters.

I’d like to do something like:

  if self.config['clustering_algorithm'] == 'hdbscan':
      alg = HDBSCAN(**self.config['clustering_algorithm']['hdbscan'])
  else:
      alg = DBSCAN(**self.config['clustering_algorithm']['dbscan'])

My hope / intention is to be able to assess the performance of multiple clustering algorithms, but not have to do any manual switching - is it possible to do something like this?

Hi @dmells123, thanks for your question! I’d recommend you to use Launch, here you have a colab example to create conditional sweeps. Please let me know if that works!

Hi David, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.