ROC and PR curves logging

Hi!
I am using (and loving) Wandb so far :slight_smile:

Today I wanted to log my validation roc and pr curves, and I used the command:

wandb.log({"val_roc" : wandb.plot.roc_curve(target_list.numpy(), pred_list.numpy(), labels=None, classes_to_plot=None)})

My task is a binary classification, and my data is in numpy array in the format [m,n], with m the number of samples and n the number of classes, my case 1 (i.e. [128,1]).

I am encountering the following error:

  File "/home/mgiordano/.pyenv/versions/3.8.11/envs/sepsis/lib/python3.8/site-packages/wandb/plot/roc_curve.py", line 74, in roc_curve
    y_true, y_probas[..., i], pos_label=classes[i]
IndexError: index 1 is out of bounds for axis 1 with size 1

I think Wandb is trying to compute the curves on other classes, that are not there. Am I missing something?

Thanks!

Hi @mgiordy, happy to help. Could you verify the shape of your arrays that you are passing to the plotting function. We’ll review the roc chart function for any errors and get back to you.

Hey, thanks for replying! The dimension is (430,1) :slight_smile:

Hey @mgiordy !

The ROC curve expects a [n, 2] array - A value for positive classification and a value for negative classification.

You most likely want to create a second axis with value 1 - axis_1.

Hey thanks for getting back :slight_smile:
Can it be that it expects a [n,2] array for the prediction and a [n] array with the ground truth? In that case no error is reported, otherwise if I pass the same format to both I get the following error: ValueError: multilabel-indicator format is not supported.

Thanks!

I think for your case the y_true array must be flattened => (430,)

Hey @mgiordy, wanted to check in if this is resolved or if there is anything else that I can do for you.

Thanks,
Ramit

Hey! Yeah now it works :slight_smile:
However, the visualisation on the wandb website is kinda off… The ROC curves had fpr and tpr on the wrong axes (I’ve fixed it, but shouldn’t the software be able to show it by default?), while the PR curve just looks wrong. Please note that sklearn is showing them correctly…

Thanks for letting us know! Could you share a code snippet with a reproduction of the broken PR chart and what you would have expected to see? I can take that information back to our engineering team to have this fixed.

Hey @mgiordy,

I hope you’re doing well. Wanted to check in if you have had a chance to look into the PR chart issue. Looking forward to hearing from you soon.

Thanks,
Ramit

Hi @mgiordy, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Hi Ramit,

Sorry for my late reply!

Running the script on the right is the expected behaviour, on the left what I get from the wandb online interface:

Please find the code snippet to reproduce the problem at the end of this message.
I hope we can sort out the issue :wink:

Best,
Marco

# Importing stuff
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import PrecisionRecallDisplay
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

import matplotlib.pyplot as plt

import wandb
wandb_project = "test_proj"
wandb.init(project=wandb_project)

# Loading dataset
X, y = load_iris(return_X_y=True)

# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)

# Limit to the two first classes, and split into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X[y < 2], y[y < 2], test_size=0.5, random_state=random_state
)

# Scaling data and fitting classifier
classifier = make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
classifier.fit(X_train, y_train)

# Getting the prediction on test set
y_score = classifier.decision_function(X_test)

# Displaying PR curve with matplotlib
display = PrecisionRecallDisplay.from_predictions(y_test, y_score, name="LinearSVC")
_ = display.ax_.set_title("2-class Precision-Recall curve")

# Adding one dimension to the prediction array as discussed
ones = np.ones(y_test.shape)
pred_wandb = np.stack((y_score, ones - y_score), axis=1)
y_test = y_test[:, None]
print("Y test and Y pred dimensions:", y_test.shape, pred_wandb.shape)
# Logging the PR with wandb
wandb.log({"val_pr" : wandb.plot.pr_curve(y_test, pred_wandb, labels=None, classes_to_plot=None)})

plt.show()

Hey Ramit,

Any chance you had a look at this issue? :slight_smile:

Thanks and best!

Marco

Hello!

Any chance you had a look at this issue? :slight_smile:

Best,
Marco

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.