How to show max performance and average across trials

So I’ve been using the default curves to monitor my RL experiments for a while now. They are very handy and easier to manage than my old .csv workflow. In my experiments, I have multiple runs with the same hyperparameters and they are organized into groups. What I’m trying to plot is this: create a bar plot of max performance averaged across trials for each group, versus the name of the group.

I tried to create a panel of bar plot, and in general it looks like what I want: it lists all the groups of runs, and automatically calculates some aggregated value of them, e.g. mean or median. But it seems that the plot is taking the mean/median of all the values from the whole group (like the concat of all trials) instead of giving me a choice of, e.g., averaging over the max return of each run. Here is what the plot looks like:

I saw that there are custom tables, but I’m not quite sure how to use them. If it’s easy to write, can someone give some hints about how to get custom tables to do this for me? Lots of thanks.

Hi @aceticia! Welcome to our W&B forums and glad you’re enjoying using W&B for helping with your RL experiments.

This is a workflow that we’ll be making much easier in the future, making it easier to get min, max etc. for a given metric for a given run / group of runs. In the meantime, you can just log the max value at the end of your runs and get the median of that. We have a convenience method called define_metric which you can use to automatically do this for you.

wandb.init(entity="wandb", project="define-metric-demo")
# define a metric we are interested in the minimum of
wandb.define_metric("loss", summary="min")
# define a metric we are interested in the maximum of
wandb.define_metric("acc", summary="max")

This will then log the max/min value of a given logged metric for you.
Here’s more documentation of this method:

2 Likes

Thank you for the quick reply! This looks easy to change in my code. But for experiments that already happened, I assume there is no easy fix?

The reason I’m asking this is because currently I’m using an RL platform called RLLib, which handles all wandb related code. I can’t do the wandb.define_metric call without breaking their interface. I tried to call init and define_metric before their init call, but this turns out to not log anything.

In future, this should be a lot easier and visible after experiments from the GUI, but currently, logging it yourself is the best option.
I’ll reach out for more input for people on the team that might have more information about that RLLib integration. If that isn’t fruitful, it might be worth opening an issue with the library maintainer (you could tag me, @scottire on Github) to see if they have any recommendations.

Looking at this documentation: Using Weights & Biases with Tune — Ray v1.8.0
It seems possible to use the @wandb_mixin to access wandb. Have you been able to try that?

I have the exact same problem using PyTorch Lightning and WandbLogger. I am logging the metrics within my (general) PL framework, so it is not at all clear to me how to change the summary mode. Any suggestion? Thanks!

1 Like

@nsacco In the case of PyTorch Lightning, you can hook in using a callback to ask wandb to log a summary metric.

Here’s an example callback that you can give to your trainer that will do that:

class LogMinLossCallback(pl.Callback):
    def on_train_epoch_start(self, trainer, pl_module):
      if trainer.current_epoch == 0:
        wandb.define_metric('train/loss_step', summary="min")

This will add a new train/loss_step.min metric, the minimum loss, to be tracked by wandb.

2 Likes