Use the same parameter but produce different results in Bayesian Sweep

I was trying to use Sweep for hyperparameter tuning. And I want to do grid sweep in tuning. Coincidently I happend to use the Bayes Sweep (since last time I use the bayes for tuning). Then something weird happened.


I can understand the Bayes search may choose same combination of hyperparameters, But why the same hyperparameters come into different results? And I check my code, I definitely have set the seed. Is there anything I missed?
And this is the yaml config I use:

method: bayes
project: classify
name: roberta-large
metric:
goal: maximize
name: best_valid_metric
parameters:
task:
values: [“emotion”]
batch_size:
values: [8, 16, 32]
plm_learning_rate:
values: [1e-5, 2e-5, 3e-5, 4e-5, 5e-5]
other_learning_rate:
values: [1e-4, 2e-4, 3e-4, 4e-4, 5e-4]
dropout:
values: [0, 0.3, 0.5]
model_name:
value: 1
num_labels:
value: 8
command:

  • ${env}
  • ${interpreter}
  • ${program}
  • “–use_wandb”
  • ${args}

Dear Zhuojun,

would you be able to confirm if you are using wandb server locally or our public cloud offering?

This is a known bug that has now been fixed in our latest version of wandb serve 0.31.0 which was released yesterday.

Upgrading to this version should fix the issue that you are experiencing with sweep combinations (repetition) of what should be permutations of parameters.

Warm regards,

Frida

OK, Thank you. I use the public cloud offering for hyperparameter tuning yesterday. Maybe it was not updated then. But I’m still curious that why the same parameter choices come into different result in bayes sweep :joy:
What’s important is that I can’t reproduce the best result in the pic I have shown:smiling_face_with_tear: Althrough the several training results keep consistent on my machine when I try to find out whether there were faults in my code.

Well, finally find the problem. I used the LSTM in my code. And there are some “non-determinism issues for RNN functions on some versions of cuDNN and CUDA.”

Dear Zhuojun,

Thanks for sharing your insights on Cuda’s non-deterministic behavior, I was not aware of this myself. I also wanted to follow up on your question about Bayesian sweeps and advice that we currently don’t offer a method to select using a random state and as such there will be variability in for example maximizing a particular metric.

I will add ensure that your use case is added to a feature request for this.

Please let us know if there is anything else that you would like assistance with at this time.

Best,

Frida

Hi Frida, is there any work around to prevent this in the public cloud version? I’m having agents repeat parameter combinations 5 + times after only 3-5 completed runs. This is in a search space of only 120 combinations so using bayes is currently seeming to be more pain than it’s worth.

Hi Hubert,

Thank you for messaging and sorry that you’re not getting the behavior that you are anticipating. I wonder if you would be able to share the config that you are using so I can spin it up on my side?

I think it is technically possible for Bayesian sweep to arrive at repeat parameters if the best set of parameters is quickly reached, and would be super curious/grateful if you’d be able to advise if the parameters that you are seeing do reflect the most accurate models.

Best,

Frida

Hi Hubert,

Wanted to check in – I see that on the original thread, this was marked as solved – I can look into this further if this is helpful but would be great if you’d be able to share the config that you are using either .yaml or python dictionary format.

Look forward to hearing back from you.

Frida

Hi Hubert,

Going to go ahead and close this off for you as we’ve not heard back. Let me know if you need any further help now or in the future.

Best,

Frida

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.