Aggregating by runs a table


I have logged confusion matrix into W&B. Now, I have the confusion matrix for every experiment (as a wandb table), and I may present them as columns in a joint plot.

I would like to get the avg. value of (let’s say) all the experiments in the last 10 days (the avg. value of True positive, avg. value of False positive, .etc.)
So, I tried to aggregate together the 10 confusion tables in the W&B platform, without any progress…

I also tried to add an aggreagate operation within the proper vega scheme [Confusion matrix V1 / V2], instead of existing joinaggregate, but than the vega result plot has wired things.

I would like a help -
If someone know how to aggregate the table values over different runs, with mean operation in the end, that would be enough for me now (the visualization part is less important, just want to present the “reduced / avg” table instead to do each one of them as a number panel + aggregate manually😁)

Thanks a lot!

Hello, This a reply from our support bot designed to assist you with your Weights & Biases related queries. To reach a human please reply to this message.

Aggregating confusion matrix data across multiple runs in W&B to calculate average values of metrics like True Positive and False Positive involves a few steps. While the documentation does not provide a direct method for aggregating tables across runs, you can achieve this by using the W&B API to fetch the data from each run, perform the aggregation manually in Python, and then log the aggregated results as a new table or visualization. Here’s a general approach to accomplish this:

  1. Fetch Runs Data: Use the W&B API to fetch the runs from your project that you’re interested in aggregating. You can filter runs based on tags, dates, or other criteria relevant to your experiments.

  2. Extract Confusion Matrices: For each run, extract the confusion matrix data. This step depends on how you’ve logged the confusion matrix (e.g., as a W&B Table or as custom metrics).

  3. Aggregate Data: Once you have the confusion matrices, you can aggregate the values manually in Python. For example, if you have the True Positive (TP) and False Positive (FP) values from each confusion matrix, you can calculate their averages across all selected runs.

  4. Log Aggregated Results: After calculating the average TP, FP, etc., you can log these aggregated results back to W&B. You could log them as a new table or use W&B’s custom charts feature to visualize the aggregated confusion matrix.

Here’s a simplified code example to illustrate steps 1 to 3:

import wandb
import numpy as np

# Initialize W&B API client
api = wandb.Api()

# Fetch runs from your project
project_runs = api.runs("your_username/your_project_name", {"created_at": {"$gt": "2023-01-01"}})

# Placeholder for confusion matrix values
tps, fps = [], []

# Iterate over runs and extract confusion matrix data
for run in project_runs:
    # This is a placeholder; you'll need to adjust it based on how you've logged your data
    tp = run.summary.get("true_positive")
    fp = run.summary.get("false_positive")
    if tp is not None and fp is not None:

# Calculate average TP and FP
avg_tp = np.mean(tps)
avg_fp = np.mean(fps)

# Log aggregated results (you can also log this as a new table or visualization)
wandb.init(project="your_project_name", job_type="aggregate")
wandb.log({"avg_true_positive": avg_tp, "avg_false_positive": avg_fp})

Note: This example assumes you’ve logged True Positive and False Positive values as summary metrics in each run. You’ll need to adjust the code to match how you’ve structured your data.

For visualizing aggregated confusion matrix data, you might consider logging the aggregated results as a custom chart or table in W&B, depending on your preference. Unfortunately, the provided documentation does not detail specific methods for aggregating tables directly in the W&B UI or using Vega for custom visualizations. If you encounter issues with Vega visualizations, it might be beneficial to reach out to W&B support or the community forums for more targeted assistance.

Remember, this approach requires manual data handling and aggregation in Python before logging the results back to W&B. It provides flexibility in how you aggregate and visualize your metrics but might require some iteration to get the exact output you’re looking for.

To reach a human please reply to this message.

-WandBot :robot:

Hi @tankwell,

I wanted to follow up here on our wandbot response.

Do you have these values logged as metrics as well or they are only logged as part of a table?

If they are also logged as metrics, you could create a Scalar Chart, calculating the Mean (or median) across the selected Runs (see this Workspace)