Thank you for your reply. Maybe I misunderstood, but as I understand it, the example that I found does not perform hyperparameter tuning at all and instead just performs k-fold CV using Sweep runs. I.e., the example uses one sweep run for each fold resulting in a total of k runs. Can you confirm that?
Additionally, the example uses multiprocessing for some reason while at the same time join
ing each of the created processes immediately (see here). In my understanding, that means that each process runs after the previous one (no parallelism) and the usage of threading seems to be due to other non-obvious reasons.
The problem for me seems to be that with a given set of hyperparameters, each call to wandb.init()
refers to the very same run internally. So if I want to loop over the folds using the same hyperparameters, Wandb ends up overwriting the previous runs/folds every time.
Here is a minimal working example:
import wandb
import wandb.sdk
import randomname
import numpy as np
from sklearn.model_selection import KFold
SWEEP_CONFIG = {
"method": "random",
"name": "my_config",
"metric": {"goal": "minimize", "name": "val_root_mean_squared_error"},
"parameters": {
"param1": {"values": [8, 16, 32]},
"param2": {"values": [1, 2, 4]},
},
}
class Experiment:
def __init__(self) -> None:
self.x_train = np.random.random((2048, 3, 1))
self.y_train = np.random.random((2048, 1))
def train(self) -> None:
kf = KFold(n_splits=4, shuffle=True)
cv_name = randomname.get_name()
for fold, (ix_train, ix_val) in enumerate(kf.split(self.x_train)):
x_fold_train, y_fold_train = self.x_train[ix_train], self.y_train[ix_train]
x_fold_val, y_fold_val = self.x_train[ix_val], self.y_train[ix_val]
run_name = f"{cv_name}-{fold:02}"
run = wandb.init(group=f"cv_{cv_name}", name=run_name, reinit=True)
assert run is not None
assert type(run) is wandb.sdk.wandb_run.Run
wandb.summary["cv_fold"] = fold
wandb.summary["num_cv_folds"] = kf.n_splits
wandb.summary["cv_random_state"] = kf.random_state
param1 = wandb.config.param1
param2 = wandb.config.param2
# random result for MWE
rmse = param1 * np.mean(y_fold_train) + param2 * np.mean(y_fold_val)
score = rmse
wandb.log({"val_root_mean_squared_error": score})
wandb.finish()
if __name__ == "__main__":
exp = Experiment()
sweep_id = wandb.sweep(sweep=SWEEP_CONFIG, project="my_proj")
wandb.agent(
sweep_id=sweep_id,
function=exp.train,
project="my_proj",
# count=40,
)
Could you point out what needs to change in this example for it to work?
EDIT: Obviously I could just NOT log the training to Wandb and instead only return the average result score for all folds - however, this is not what I want. I want to be able to compare the loss graphs of different folds etc.