Hyperparameter tuning combined with k-fold cross validation

mbp · February 14, 2023, 3:16pm

I found this official example showcasing an implementation of k-fold cross validation using Sweeps. However, I am doing hyperparameter tuning with sweeps so I am coming from a different angle: I want to do k-fold CV for one given set of parametersr for each sweep run. I would imagine that there should be sub-groups for each CV-group in the sweep view of the web interface.

Is this possible to do with Wandb or should I look elsewhere?

Thank you!

EDIT: the rationale behind it is to prevent optimizing hyper parameters to overfit the test set. If you have another means to reach this goal, I am open for it.

bill-morrisson · February 16, 2023, 8:17pm

Hey,

Thank you for contacting us! Yes, it is possible to perform k-fold cross-validation for a given set of hyperparameters with Wandb Sweeps. In fact, the example you found is a good starting point for implementing k-fold cross-validation in your own Sweep runs.

mbp · February 20, 2023, 4:52pm

Thank you for your reply. Maybe I misunderstood, but as I understand it, the example that I found does not perform hyperparameter tuning at all and instead just performs k-fold CV using Sweep runs. I.e., the example uses one sweep run for each fold resulting in a total of k runs. Can you confirm that?

Additionally, the example uses multiprocessing for some reason while at the same time joining each of the created processes immediately (see here). In my understanding, that means that each process runs after the previous one (no parallelism) and the usage of threading seems to be due to other non-obvious reasons.

The problem for me seems to be that with a given set of hyperparameters, each call to wandb.init() refers to the very same run internally. So if I want to loop over the folds using the same hyperparameters, Wandb ends up overwriting the previous runs/folds every time.

Here is a minimal working example:

import wandb
import wandb.sdk
import randomname
import numpy as np
from sklearn.model_selection import KFold

SWEEP_CONFIG = {
    "method": "random",
    "name": "my_config",
    "metric": {"goal": "minimize", "name": "val_root_mean_squared_error"},
    "parameters": {
        "param1": {"values": [8, 16, 32]},
        "param2": {"values": [1, 2, 4]},
    },
}


class Experiment:
    def __init__(self) -> None:
        self.x_train = np.random.random((2048, 3, 1))
        self.y_train = np.random.random((2048, 1))

    def train(self) -> None:
        kf = KFold(n_splits=4, shuffle=True)
        cv_name = randomname.get_name()
        for fold, (ix_train, ix_val) in enumerate(kf.split(self.x_train)):
            x_fold_train, y_fold_train = self.x_train[ix_train], self.y_train[ix_train]
            x_fold_val, y_fold_val = self.x_train[ix_val], self.y_train[ix_val]

            run_name = f"{cv_name}-{fold:02}"
            run = wandb.init(group=f"cv_{cv_name}", name=run_name, reinit=True)
            assert run is not None
            assert type(run) is wandb.sdk.wandb_run.Run
            wandb.summary["cv_fold"] = fold
            wandb.summary["num_cv_folds"] = kf.n_splits
            wandb.summary["cv_random_state"] = kf.random_state

            param1 = wandb.config.param1
            param2 = wandb.config.param2
            # random result for MWE
            rmse = param1 * np.mean(y_fold_train) + param2 * np.mean(y_fold_val)
            score = rmse
            wandb.log({"val_root_mean_squared_error": score})
            wandb.finish()


if __name__ == "__main__":
    exp = Experiment()

    sweep_id = wandb.sweep(sweep=SWEEP_CONFIG, project="my_proj")
    wandb.agent(
        sweep_id=sweep_id,
        function=exp.train,
        project="my_proj",
        # count=40,
    )

Could you point out what needs to change in this example for it to work?

EDIT: Obviously I could just NOT log the training to Wandb and instead only return the average result score for all folds - however, this is not what I want. I want to be able to compare the loss graphs of different folds etc.

mbp · February 28, 2023, 2:47pm

I just want to make sure I understand correctly: Even though you said that this should be possible to do, I have shown in my MVP that it does not work. In my understanding this means that you did misunderstand what I meant (or didn’t know) and it is indeed not possible to perform hyperparameter tuning in addition to k-fold CV with W&B and I will have to look somewhere else. Could you confirm this? Please let me know.

thecml · March 3, 2023, 2:21pm

I’m also looking for ways to do this. As @mbp wrote, it would be nice to have metric curves per run and then be able to group these per fold. Now, if you just create a new run per every fold, it gets overwritten next configuration sweep.

bill-morrisson · March 3, 2023, 2:48pm

Hey @MBP,
Thank you very much for your patience!
I mentioned that the example you found was a good starting point for implementing k-fold cross validation in your sweep runs. Meaning it was a good starting point to add the sweep configurations to launch the agents.
The example you provided do run one sweep for each fold.
It seems like I didn’t get to understand what you wanted. Are you trying to parallelize sweep agents in such a way that each sweep agent performs k-fold CV?

mbp · March 4, 2023, 8:57pm

Hi @bill-morrisson, I am not sure what more I can do to explain this. I have even added source code above so that you can run it yourself and see the issue. I will try to rephrase it:

I want to run a hyperparameter study with W&B and I want to use Sweeps for it. This is well-documented and works on its own. For each run I will receive a set of parameters from run.config. So far so good!

Now I want to use one run and the parameters from this particular run and I want to perform k-fold cross validation with these parameters. That would be easy - I just need to run the training in a loop, and train one model for each fold, right?

But now, I want to log all those runs to W&B as well! How to do that? It seems it’s not possible because when I use the wandb.log and other functions in the loop, the previous value will just be overwritten. This is the problem I want to solve - how to do this without overwriting the previous values?

Additionally, one could think that if I run wandb.init once for each fold, then maybe the wandb.log will not be overwritten and instead the folds will be logged. Alas the values will still be overwritten. A new run will be created in W&B but the previous run will just disappear. For example the first run/fold is called mysterious-sweep-1 and then the second fold will start, a new run with the name epic-sweep-2. And now, the run mysterious-sweep-1 is disappeared completely and all previous logged values are overwritten.

I hope this helps to clarify.

magenbrot · March 8, 2023, 7:57pm

I understand your question.
I jsut want to point oout, I’m having the same issue. Previous runs getting overwritten. Im ending up with only 3 run per job, not 1 per kFold.
When I run it with kfold not within a sweep agent, it works as inteden (grouped in the UI, all there)
Anyone any idea?

galharari · March 11, 2023, 4:20pm

I have the same problem where multiple runs for the same group, each run for a different fold, are all overwritten into 1 run for the entire k-folds.
It means that all the information of the runs not including the last fold run is lost.

mbp · March 14, 2023, 9:05am

I think we don’t get any official replies because either this is

not possible and therefore we wait in vain, or
it is so simple and obvious that we are being ignored.

I hope it’s the latter and we can find out how to do it ourselves.

bill-morrisson · March 14, 2023, 1:15pm

Hi @MBP,

Sorry for the time taken to get back to you. We haven’t yet dug into it specifically.
We’ll be looking into it with our engineering team and let you know.

magenbrot · March 15, 2023, 11:17am

I have posted an github issue. I think they are referring to that.

github.com/wandb/wandb

[CLI]: Using sweeps, successive wandb.init() calls overwrite older runs

opened 11:41AM - 09 Mar 23 UTC

MarSond

c:sweeps cli

### Describe the bug Using KFold nested runs, within a sweep loop. Runs are g…rouped with a random id. But after the sweep / all folds are finished, only the most recent KFold run is persistent and old data is overwritten. Every KFold, there is a new run created, but ID and Path are always the same. Only in the next sweep run the group cand run id changes. You can see, the printed run id in the attached log and also the "view run at" URL stays identical. I experimented with reinit=True/False, adding extra wandb.finish(), but no change. When calling the run_loop() manually, it gives the expected results. The issue is only when called from sweep agent. Using notebook or .py file for sweeps changes nothing. I tried aswell, not to use "with wand.init() as run", but start and save the run object and use it then with run.finish() ```python class Training(): def __init__(self, base_config, device): self.base_config = base_config self.device = device self.datalist = None def set_data(self, inputs, outputs): self.datalist = [(inputs[i], outputs[i]) for i in range(len(inputs))] def train_loop(self, kFoldSplits=2): myKfold = KFold(n_splits=kFoldSplits, shuffle=True) kSplit = 0 for train_index, val_index in myKfold.split(self.datalist): kSplit += 1 with wandb.init(config=self.base_config, group=self.base_config['group_name']) as temp_run: # New run for every fold self.run_config = temp_run.config # Updated config to reflect sweep changes print("run id: ", temp_run.id) # ISSUE here: for every kFOld iteration, the new init produces same ID and name, overwriting older folds print("run name: ", temp_run.name) print("run path: ", temp_run.dir) print(f"Start KFold {kSplit}") #trainloader, validationloader = self.get_dataloader(train_index, val_index) for epoch in range(self.run_config['epochs']): print("-"*25) print(f"Start Training Epoch {epoch}") #self.train_epoch(trainloader, temp_run) # Real training would be here temp_run.log({"target": epoch}) # obviously bogus, just to test print(f"Finish KFold {kSplit}") print("$"*80) ########## def run_job(): # Main method, called once per sweep device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") group_name=f"experiment_{random.randrange(111111, 999999, 6)}" # Random group name, unique for each sweep job print("Group name: ", group_name) base_config = { "lr": 0.001, "epochs": 10, "batch_size": 32, "kfold": 3, # ... "group_name": group_name, } # My config which contains lr, epochs, batch_size, etc. training = Training(base_config = base_config, device = device) # Training wrapper - Stripped version training.set_data(torch.randn(100, 10),torch.randn(100, 1)) training.train_loop(kFoldSplits=3) # start the training with folds ########### Sweep creation and start sweep_configuration = { 'method': 'random', 'name': 'sweep_test', 'metric': {'goal': 'maximize', 'name': 'target'}, 'parameters': { 'epochs': {'values': [2, 4]}, 'kfold': {'values': [2, 4]}, } } START_NEW_SWEEP = True if START_NEW_SWEEP: sweep_id = wandb.sweep(sweep=sweep_configuration, project="test-sweep_project_name") print(sweep_id) #sweep_id = sweep_id sweep_id = "pp7qjzz9" wandb.agent(sweep_id, function=run_job, project="test-sweep_project_name") ## Running the run_job with sweep agent, it produices the opverwriting issue ## Calling run_job individually, produces expected resutls, one group, with num_fold individually tracked runs ``` ```shell wandb: Agent Starting Run: a0jp1107 with config: wandb: epochs: 4 wandb: kfold: 4 Group name: experiment_165261 wandb: Currently logged in as: magenbrot. Use `wandb login --relogin` to force relogin Tracking run with wandb version 0.13.11 Syncing run lilac-sweep-3 to Weights & Biases (docs) Sweep page: https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View project at https://wandb.ai/magenbrot/test-sweep_project_name View sweep at https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View run at https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 run id: **_a0jp1107_** run name: lilac-sweep-3 Start KFold 1 ------------------------- Start Training Epoch 0 Start Validation Epoch 0 ------------------------- Start Training Epoch 1 Start Validation Epoch 1 ------------------------- Start Training Epoch 2 Start Validation Epoch 2 ------------------------- Start Training Epoch 3 Start Validation Epoch 3 Finish KFold 1 Waiting for W&B process to finish... (success). 0.067 MB of 0.089 MB uploaded (0.000 MB deduped) Run history: target ▁▃▆█ val_target ▁▃▆█ Run summary: target 3 val_target 4 View run lilac-sweep-3 at: https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 Synced 7 W&B file(s), 0 media file(s), 0 artifact file(s) and 2 other file(s) Find logs at: .\wandb\run-20230309_121936-a0jp1107\logs Waiting for wandb.init()... Tracking run with wandb version 0.13.11 Syncing run lilac-sweep-3 to Weights & Biases (docs) Sweep page: https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View project at https://wandb.ai/magenbrot/test-sweep_project_name View sweep at https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View run at https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 run id: **_a0jp1107_** run name: lilac-sweep-3 Start KFold 2 ------------------------- Start Training Epoch 0 Start Validation Epoch 0 ------------------------- Start Training Epoch 1 Start Validation Epoch 1 ------------------------- Start Training Epoch 2 Start Validation Epoch 2 ------------------------- Start Training Epoch 3 Start Validation Epoch 3 Finish KFold 2 Waiting for W&B process to finish... (success). Run history: target ▁▃▆█ val_target ▁▃▆█ Run summary: target 3 val_target 4 View run lilac-sweep-3 at: https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 Synced 7 W&B file(s), 0 media file(s), 0 artifact file(s) and 2 other file(s) Find logs at: .\wandb\run-20230309_121951-a0jp1107\logs Waiting for wandb.init()... Tracking run with wandb version 0.13.11 Syncing run lilac-sweep-3 to Weights & Biases (docs) Sweep page: https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View project at https://wandb.ai/magenbrot/test-sweep_project_name View sweep at https://wandb.ai/magenbrot/test-sweep_project_name/sweeps/pp7qjzz9 View run at https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 run id: **_a0jp1107_** run name: lilac-sweep-3 Start KFold 3 ------------------------- Start Training Epoch 0 Start Validation Epoch 0 ------------------------- Start Training Epoch 1 Start Validation Epoch 1 ------------------------- Start Training Epoch 2 Start Validation Epoch 2 ------------------------- Start Training Epoch 3 Start Validation Epoch 3 Finish KFold 3 Waiting for W&B process to finish... (success). 0.080 MB of 0.090 MB uploaded (0.000 MB deduped) Run history: target ▁▃▆█ val_target ▁▃▆█ Run summary: target 3 val_target 4 View run lilac-sweep-3 at: https://wandb.ai/magenbrot/test-sweep_project_name/runs/a0jp1107 Synced 7 W&B file(s), 0 media file(s), 0 artifact file(s) and 2 other file(s) Find logs at: .\wandb\run-20230309_122006-a0jp1107\logs $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ **_New Sweep starting here using new group_** wandb: Agent Starting Run: hjtswhd8 with config: wandb: epochs: 2 wandb: kfold: 2 ..... ``` ### Additional Files _No response_ ### Environment WandB version: 0.13.11 (but tested at 0.13.6 with same issue) OS: Windows 11 Python version: 3.9.16 using mamba Versions of relevant libraries: "up to date installed for this python version" ### Additional Context _No response_

Should help additionally to reproduce and find error

anmolmann · April 10, 2023, 3:04pm

Hi @magenbrot , @mbp : I’ve left a reply here. Hoping to revive the conversation in the GitHub thread.

system · June 9, 2023, 3:05pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sweeps hyperparameter tuning with cross validation W&B Help	2	510	January 19, 2022
Multithreading support for Sweeps W&B Help sweeps , wandb	10	1249	January 1, 2024
Hypyerparameter optimization with k folds on each iteration W&B Help sweeps	5	482	January 24, 2023
Sweep: force agents to run through the same sequence of hyperparameters on different machines W&B Help sweeps , wandb	4	798	December 19, 2023
Hugging Face with Sweeps causes Broken pipe W&B Help sweeps	2	859	December 24, 2023

Hyperparameter tuning combined with k-fold cross validation

Related topics