My task is a binary classification, and my data is in numpy array in the format [m,n], with m the number of samples and n the number of classes, my case 1 (i.e. [128,1]).
I am encountering the following error:
File "/home/mgiordano/.pyenv/versions/3.8.11/envs/sepsis/lib/python3.8/site-packages/wandb/plot/roc_curve.py", line 74, in roc_curve
y_true, y_probas[..., i], pos_label=classes[i]
IndexError: index 1 is out of bounds for axis 1 with size 1
I think Wandb is trying to compute the curves on other classes, that are not there. Am I missing something?
Hi @mgiordy, happy to help. Could you verify the shape of your arrays that you are passing to the plotting function. We’ll review the roc chart function for any errors and get back to you.
Hey thanks for getting back
Can it be that it expects a [n,2] array for the prediction and a [n] array with the ground truth? In that case no error is reported, otherwise if I pass the same format to both I get the following error: ValueError: multilabel-indicator format is not supported.
Hey! Yeah now it works
However, the visualisation on the wandb website is kinda off… The ROC curves had fpr and tpr on the wrong axes (I’ve fixed it, but shouldn’t the software be able to show it by default?), while the PR curve just looks wrong. Please note that sklearn is showing them correctly…
Thanks for letting us know! Could you share a code snippet with a reproduction of the broken PR chart and what you would have expected to see? I can take that information back to our engineering team to have this fixed.
Hi @mgiordy, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
Please find the code snippet to reproduce the problem at the end of this message.
I hope we can sort out the issue
Best,
Marco
# Importing stuff
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import PrecisionRecallDisplay
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
import wandb
wandb_project = "test_proj"
wandb.init(project=wandb_project)
# Loading dataset
X, y = load_iris(return_X_y=True)
# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
# Limit to the two first classes, and split into training and test
X_train, X_test, y_train, y_test = train_test_split(
X[y < 2], y[y < 2], test_size=0.5, random_state=random_state
)
# Scaling data and fitting classifier
classifier = make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
classifier.fit(X_train, y_train)
# Getting the prediction on test set
y_score = classifier.decision_function(X_test)
# Displaying PR curve with matplotlib
display = PrecisionRecallDisplay.from_predictions(y_test, y_score, name="LinearSVC")
_ = display.ax_.set_title("2-class Precision-Recall curve")
# Adding one dimension to the prediction array as discussed
ones = np.ones(y_test.shape)
pred_wandb = np.stack((y_score, ones - y_score), axis=1)
y_test = y_test[:, None]
print("Y test and Y pred dimensions:", y_test.shape, pred_wandb.shape)
# Logging the PR with wandb
wandb.log({"val_pr" : wandb.plot.pr_curve(y_test, pred_wandb, labels=None, classes_to_plot=None)})
plt.show()