Rich reductions when aggregating metrics

tristanengst · July 27, 2024, 10:47am

I need to aggregate logged data in a special way, involving sequential means and maxes over different config values.

In my case, I have one WandB pretraining run for each value from a grid of hyperparameters and seeds. Each of these runs generates multiple checkpoints. For each of these checkpoints, I use multiple evaluation WandB runs to evaluate it with a different set of hyperparameters. I have config values for the pretraining checkpoint and pretraining hyperparameter grid logged as part of the config for each evaluation run.

Thus for each element of the pretraining hyperparameter grid, I would like to look at the average over seeds, max over pretraining checkpoints, max over evaluation hyperparameters run. I can do the necessary math in my own code—Pandas is mostly up to the task—but I’d love to be able to do this in the WandB dashboard.

Is there a way I can do this?

Thanks!

jason-arkens17 · July 31, 2024, 2:45am

Thank you for your question about complex metric aggregation in W&B sweeps. While W&B offers powerful hyperparameter optimization and visualization tools, the specific multi-step aggregation you described (sequential means and maxes over different configuration values) isn’t directly supported in the W&B dashboard or sweeps functionality. However, I can suggest some approaches that may help you achieve your goal.

First, let’s clarify what W&B can do out-of-the-box:

Log metrics and hyperparameters from your runs
Visualize and compare runs with grouping, filtering, and basic aggregation
Optimize sweeps for specific metrics

For more complex analyses like you’ve described, you’ll need to combine W&B’s data logging capabilities with custom analysis. Here’s an approach you could take:

Log all relevant metrics and configuration values during your runs using wandb.log().
Use the W&B API to retrieve your logged data.
Perform your custom aggregations using a library like Pandas.
Here’s an expanded example of how you might do this:

import wandbimport pandas as pdapi = wandb.Api()runs = api.runs(“your-entity/your-project”)# Fetch data from W&Bdata = for run in runs: # Assuming you’ve logged all relevant data data.append({ ‘seed’: run.config.get(‘seed’), ‘checkpoint’: run.config.get(‘checkpoint’), ‘hyperparam1’: run.config.get(‘hyperparam1’), ‘metric’: run.summary.get(‘your_metric’) })df = pd.DataFrame(data)# Average over seedsmean_df = df.groupby([‘hyperparam1’, ‘checkpoint’]).mean().reset_index()# Max over checkpointsmax_df = mean_df.groupby(‘hyperparam1’).max(‘metric’).reset_index()# Max over hyperparam1final_result = max_df[‘metric’].max()print(f"Final result after aggregation: {final_result}")# Optionally, log this result back to W&Bwith wandb.init(project=“your-project”, job_type=“analysis”) as run: wandb.log({“aggregated_metric”: final_result})

This script demonstrates the full pipeline of fetching data from W&B, performing your desired aggregations, and even logging the result back to W&B if you wish.

While this requires some additional code, it allows you to leverage both W&B’s robust experiment tracking and the full flexibility of custom analysis.

Some additional suggestions:

Consider using W&B Tables to log structured data, which can make retrieval and analysis easier.
Explore W&B’s built-in visualization tools like parallel coordinates plots or custom charts for insights that don’t require complex aggregation.
For frequently used analyses, you could create a custom script or Jupyter notebook that pulls data from W&B and generates your desired aggregations and visualizations.

I hope this helps provide a path forward! Let me know if you have any questions about implementing this approach or if you’d like to explore other ways to analyze your data within W&B.

jason-arkens17 · August 1, 2024, 5:05pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

tristanengst · August 2, 2024, 4:00am

Thanks! I will look into my own data post-processing.

Topic		Replies	Views
How to show max performance and average across trials W&B Help wandb	7	3018	April 20, 2022
Hypersweep metric with a dictionary hierarchy W&B Help	3	348	October 4, 2022
Sweeps with multiple seeds for the same config values W&B Help wandb	8	2615	April 20, 2022
I wonder if the utility to aggregate over multiple seeds was added or not in later releases W&B Help sweeps , wandb	4	523	June 1, 2023
Collect results from sweep W&B Help	3	395	June 19, 2023

Rich reductions when aggregating metrics

Related topics