Hi @yanyiphei sounds good, that’s completely understandable. Based on our tests, the algorithm is working as expected. Additionally, just yesterday a user mentioned on the forum here that our hyperband example from the repository worked fine. I will mark this ticket as closed for now but will keep an eye out in case there are any other similar reports. However, if you have time to try the recommendations mentioned above, we will be more than happy to continue investigating. Thank you!
It is sad to see that you easily dismissed this problem purely based on whatever internal unit tests you may have. The early_terminate feature is notoriously undocumented and it didnt even work because of a bug I myself debugged earlier in this thread. Because of this history, I was hoping for a little more doubt on the feature’s proper functioning and more proactive investigation.
In fact, I think I know now what the issue is. I believe min_iter
(and brackets in general) refers to the direct counter of a metric. Instead, I believed that min_iter
referred to the step
metric. Now, whenever I log a metric, I always set a global step with wandb.log(...., step = training_iter)
because I may have separate wandb.log that I want tracked under the same step. Because of the assumption, I set min_iter
based on the step value, not the actual counter value, but the target early_terminate metric has a count smaller than min_iter, so thats why nothing gets terminated. In fact, this is clearly self-evident in the last sweep I gave you, had you known this nuanced distinction.
I havent tested to prove this but its the most logical conclusion. Again, if you had good documentation, it would have saved both of us the trouble. Even just renaming it to something like min_index
would have been much clearer
Hi @yanyiphei, thank you for your message. I want to assure you that we are not dismissing the issue you are encountering. As you know, we have invested considerable time in investigating and resolving the original bug you reported, and we greatly appreciate all the details you provided. Having said that, I am more than willing to continue investigating, but will need some additional information from you. Since you’ve mentioned that you won’t have the time resources required, we could put this on hold until you are more available.
This would involve providing us with a reproducible example, or running a few variations to help us troubleshoot. Specifically, a code snippet would have been helpful here to know that you were specifying a step
argument in wandb.log()
. Please keep us posted if that resolved the issue for you.