Rich reductions when aggregating metrics

I need to aggregate logged data in a special way, involving sequential means and maxes over different config values.

In my case, I have one WandB pretraining run for each value from a grid of hyperparameters and seeds. Each of these runs generates multiple checkpoints. For each of these checkpoints, I use multiple evaluation WandB runs to evaluate it with a different set of hyperparameters. I have config values for the pretraining checkpoint and pretraining hyperparameter grid logged as part of the config for each evaluation run.

Thus for each element of the pretraining hyperparameter grid, I would like to look at the average over seeds, max over pretraining checkpoints, max over evaluation hyperparameters run. I can do the necessary math in my own code—Pandas is mostly up to the task—but I’d love to be able to do this in the WandB dashboard.

Is there a way I can do this?

Thanks!

Thank you for your question about complex metric aggregation in W&B sweeps. While W&B offers powerful hyperparameter optimization and visualization tools, the specific multi-step aggregation you described (sequential means and maxes over different configuration values) isn’t directly supported in the W&B dashboard or sweeps functionality. However, I can suggest some approaches that may help you achieve your goal.

First, let’s clarify what W&B can do out-of-the-box:

  • Log metrics and hyperparameters from your runs
  • Visualize and compare runs with grouping, filtering, and basic aggregation
  • Optimize sweeps for specific metrics

For more complex analyses like you’ve described, you’ll need to combine W&B’s data logging capabilities with custom analysis. Here’s an approach you could take:

  1. Log all relevant metrics and configuration values during your runs using wandb.log().

  2. Use the W&B API to retrieve your logged data.

  3. Perform your custom aggregations using a library like Pandas.
    Here’s an expanded example of how you might do this:

    import wandbimport pandas as pdapi = wandb.Api()runs = api.runs(“your-entity/your-project”)# Fetch data from W&Bdata = for run in runs: # Assuming you’ve logged all relevant data data.append({ ‘seed’: run.config.get(‘seed’), ‘checkpoint’: run.config.get(‘checkpoint’), ‘hyperparam1’: run.config.get(‘hyperparam1’), ‘metric’: run.summary.get(‘your_metric’) })df = pd.DataFrame(data)# Average over seedsmean_df = df.groupby([‘hyperparam1’, ‘checkpoint’]).mean().reset_index()# Max over checkpointsmax_df = mean_df.groupby(‘hyperparam1’).max(‘metric’).reset_index()# Max over hyperparam1final_result = max_df[‘metric’].max()print(f"Final result after aggregation: {final_result}")# Optionally, log this result back to W&Bwith wandb.init(project=“your-project”, job_type=“analysis”) as run: wandb.log({“aggregated_metric”: final_result})

This script demonstrates the full pipeline of fetching data from W&B, performing your desired aggregations, and even logging the result back to W&B if you wish.

While this requires some additional code, it allows you to leverage both W&B’s robust experiment tracking and the full flexibility of custom analysis.

Some additional suggestions:

  1. Consider using W&B Tables to log structured data, which can make retrieval and analysis easier.
  2. Explore W&B’s built-in visualization tools like parallel coordinates plots or custom charts for insights that don’t require complex aggregation.
  3. For frequently used analyses, you could create a custom script or Jupyter notebook that pulls data from W&B and generates your desired aggregations and visualizations.

I hope this helps provide a path forward! Let me know if you have any questions about implementing this approach or if you’d like to explore other ways to analyze your data within W&B.

1 Like

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Thanks! I will look into my own data post-processing.