ROC and PR curves logging

mgiordy · January 11, 2023, 11:43pm

Hi!
I am using (and loving) Wandb so far

Today I wanted to log my validation roc and pr curves, and I used the command:

wandb.log({"val_roc" : wandb.plot.roc_curve(target_list.numpy(), pred_list.numpy(), labels=None, classes_to_plot=None)})

My task is a binary classification, and my data is in numpy array in the format [m,n], with m the number of samples and n the number of classes, my case 1 (i.e. [128,1]).

I am encountering the following error:

  File "/home/mgiordano/.pyenv/versions/3.8.11/envs/sepsis/lib/python3.8/site-packages/wandb/plot/roc_curve.py", line 74, in roc_curve
    y_true, y_probas[..., i], pos_label=classes[i]
IndexError: index 1 is out of bounds for axis 1 with size 1

I think Wandb is trying to compute the curves on other classes, that are not there. Am I missing something?

Thanks!

mohammadbakir · January 17, 2023, 9:09pm

Hi @mgiordy, happy to help. Could you verify the shape of your arrays that you are passing to the plotting function. We’ll review the roc chart function for any errors and get back to you.

mgiordy · January 18, 2023, 12:17am

Hey, thanks for replying! The dimension is (430,1)

ramit_goolry · January 23, 2023, 6:01pm

Hey @mgiordy !

The ROC curve expects a [n, 2] array - A value for positive classification and a value for negative classification.

You most likely want to create a second axis with value 1 - axis_1.

mgiordy · January 24, 2023, 5:33pm

Hey thanks for getting back
Can it be that it expects a [n,2] array for the prediction and a [n] array with the ground truth? In that case no error is reported, otherwise if I pass the same format to both I get the following error: ValueError: multilabel-indicator format is not supported.

Thanks!

marioparreno · January 25, 2023, 4:28pm

I think for your case the y_true array must be flattened => (430,)

ramit_goolry · January 30, 2023, 3:41am

Hey @mgiordy, wanted to check in if this is resolved or if there is anything else that I can do for you.

Thanks,
Ramit

mgiordy · January 31, 2023, 1:13pm

Hey! Yeah now it works
However, the visualisation on the wandb website is kinda off… The ROC curves had fpr and tpr on the wrong axes (I’ve fixed it, but shouldn’t the software be able to show it by default?), while the PR curve just looks wrong. Please note that sklearn is showing them correctly…

ramit_goolry · February 1, 2023, 5:39pm

Thanks for letting us know! Could you share a code snippet with a reproduction of the broken PR chart and what you would have expected to see? I can take that information back to our engineering team to have this fixed.

ramit_goolry · February 5, 2023, 7:51pm

Hey @mgiordy,

I hope you’re doing well. Wanted to check in if you have had a chance to look into the PR chart issue. Looking forward to hearing from you soon.

Thanks,
Ramit

ramit_goolry · February 7, 2023, 4:24pm

Hi @mgiordy, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

mgiordy · February 27, 2023, 10:24pm

Hi Ramit,

Sorry for my late reply!

Running the script on the right is the expected behaviour, on the left what I get from the wandb online interface:

Please find the code snippet to reproduce the problem at the end of this message.
I hope we can sort out the issue

Best,
Marco

# Importing stuff
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import PrecisionRecallDisplay
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

import matplotlib.pyplot as plt

import wandb
wandb_project = "test_proj"
wandb.init(project=wandb_project)

# Loading dataset
X, y = load_iris(return_X_y=True)

# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)

# Limit to the two first classes, and split into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X[y < 2], y[y < 2], test_size=0.5, random_state=random_state
)

# Scaling data and fitting classifier
classifier = make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
classifier.fit(X_train, y_train)

# Getting the prediction on test set
y_score = classifier.decision_function(X_test)

# Displaying PR curve with matplotlib
display = PrecisionRecallDisplay.from_predictions(y_test, y_score, name="LinearSVC")
_ = display.ax_.set_title("2-class Precision-Recall curve")

# Adding one dimension to the prediction array as discussed
ones = np.ones(y_test.shape)
pred_wandb = np.stack((y_score, ones - y_score), axis=1)
y_test = y_test[:, None]
print("Y test and Y pred dimensions:", y_test.shape, pred_wandb.shape)
# Logging the PR with wandb
wandb.log({"val_pr" : wandb.plot.pr_curve(y_test, pred_wandb, labels=None, classes_to_plot=None)})

plt.show()

mgiordy · March 7, 2023, 11:46am

Hey Ramit,

Any chance you had a look at this issue?

Thanks and best!

Marco

mgiordy · March 22, 2023, 12:31pm

Hello!

Any chance you had a look at this issue?

Best,
Marco

system · May 21, 2023, 12:32pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Class labels not displaying for PR Curve or ROC Curve W&B Help	2	250	April 4, 2024
Problem with wandb.plot.pr_curve W&B Help	3	278	April 2, 2022
Getting KeyError: tensor([0]) while plotting wandb's confusion matrix W&B Help wandb	3	948	October 29, 2022
Custom Tooltip W&B Help dashboard , tables , wandb	5	899	May 24, 2023
Why I am logging same plot all over again? W&B Help sweeps	6	466	April 16, 2023

ROC and PR curves logging

Related topics