Example code for how to set up logging processes (cross validation folds) and grouping them?

Hi,

I am new to wandb and I am trying to figure out how to set up logging processes (based on cross validation folds) and to group them. What I would like to do is to plot/visualise performances for each fold in a cross validation scheme.

In this Colab example notebook

at the very bottom the section “Basic Setup” says in point 2: "Groups: For multiple processes or cross validation folds, log each process as a runs and group them together. wandb.init(group='experiment-1')". I am not quite sure how to do this. I searched the documentation, but I was not successful. Can anyone point me to some example code how to do this?

Basically, what I am interested in is to visualise ROC, etc. for each fold and compare how much they differ.

Thanks in advance!
Oliver

Hi @olto,
Thank you for trying out Weights and Biases! If you call

run = wandb.init(project='my_project', group='my_group')

from within a KFold for loop to start a new run for each fold and then

wandb.log({"roc" : wandb.plot.roc_curve(ground_truth, predictions)})

to log the ROC curves to that run

Here is what that may look like:

k_fold = KFold(n_splits=5)

for train_idx, test_idx in k_fold.split(X,y):
    X_train, X_test= X[train_idx], X[test_idx] 
    y_train, y_test= y[train_idx], y[test_idx]
    run = wandb.init(project='my_project', group='my_group')
    model = # build model
    model = model.fit(X_train,y_train)
    y_hat = model.predict_proba(X_test)

    wandb.log({"roc" : wandb.plot.roc_curve(y, y_hat)})

    wandb.finish()

Also here is a more advanced example using sweeps. Let me know if this helps to clarify or if you have any more questions about this.

Thank you,
Nate

Hi @olto,
Thank you for trying out Weights and Biases! If you call
run = wandb.init(project='my_project', group='my_group')

from within a KFold for loop to start a new run for each fold and then

wandb.log({"roc" : wandb.plot.roc_curve( ground_truth, predictions, labels=labels)})

to log the ROC curves to that run

Here is what that may look like:

k_fold = KFold(n_splits=5)

for train_idx, test_idx in k_fold.split(X,y):
    X_train, X_test= X[train_idx], X[test_idx]
    y_train, y_test= y[train_idx], y[test_idx]
    run = wandb.init(project='my_project', group='my_group')
    model = # build model
    model = model.fit(X_train,y_train)
    y_hat = model.predict_proba(X_test)

    wandb.log({"roc" : wandb.plot.roc_curve(y, y_hat)})

    wandb.finish()

Also here is a more advanced example using sweeps. Let me know if this helps to clarify or if you have any more questions about this.

Thank you,
Nate

[Discourse post]

Thanks a lot, Nate!

I ended up with something similar by playing around with various approaches. One thing I did differently though was calling

wandb.finish()

only once at the end of the script outside the KFold for loop. Looking at your code suggestions, I understand that I should finish every single run with wandb.finish().

Thanks a lot also for the example using sweeps. That will come in handy.

Cheers
Oliver

@olto No problem! Thanks for the question. It gave me a chance to dig into this myself.

When you start a new run with wandb.init() it basically calls wandb.finish() on the previous run so both codes work essentially the same. I just like to explicitly call it to finish a run but either works!

Thank you,
Nate

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.