Grouping custom metrics by configuration

Hi, I started using WandB today together with PyTorch Lightning.

I am using a LightningModule which retrieves the input data and labels. I have also associated each input/output pair with json configuration file which describes the capture environment (e.g., if it is multi host or single host, bandwidth, delay, BDP factor). I know how to log values such as F1 score and accuracy for each sample but I am confused how to associate each value with a configuration. Is there any guides available that addresses this or something similar?

For instance, in test_step from the LightningModule I have

def test_step(self, batch, batch_idx):
  x, y, config_file =  batch
  y_pred = self.forward(x)
  loss = self.loss(y_pred, y)
  self.log("test/loss", loss)
  return loss

I would like to do something like this:

def test_step(self, batch, batch_idx):
  x, y, config_file =  batch
  y_pred = self.forward(x)
  loss = self.loss(y_pred, y)
  self.log(
    "test/loss", 
    loss, 
    configuration = {
      "delay": "10ms",
      "BDP": 3,
       # etc ...
    }
  )
  return loss

The config_file is a json file with configuration.

My guess is that I could probably change "test/loss" to "test/loss/10ms/3", but I am not sure if this is the best way to go about it, how would the charts look in the w&b dashboard? I want to be able to compare different environment settings somehow.

Thanks in advance!

Hi Kevin,
This is an interesting use case. Just so that I can confirm what you are trying to do let me try to sum this up.

You would like to capture the exact environment at the time that the test/loss is logged correct?

Does the environment changes dynamically or is it something you are setting at the beginning of the run?

Also, test/loss is logged multiple times within the same run correct?

If this is the case I don’t know that logging it as “test/loss/10ms/3” would be the best way to do this as you will then have a plot for each different environment with only one point on it in the dashboard. What would you like a plot to look like on the dashboard after you log this?

Sorry for all of the questions. I just want to make sure I understand your goal before I try to come up with a solution.
Thank you,
Nate

Hi Nathan,
Thanks for the answer. The environment (i.e. config) comes from the test data, so you could say that these environment variables are set at the beginning of the run.

Ultimately, I would like to plot some metric over grouped configuration values such as delay. So one plot might have F1 score on the y axis and ms on x axis.

I think i solved this by aggregating everything into a huge matrix and use the log_table syntax.

Hi @kevjn, I’m glad you found a way to make this work! If I’m understanding your use case correctly there may be a way to simplify this a little bit.

If you set the wandb.config using the following at the beginning of the run, you can later group by certain config parameters in the UI to show just runs with a certain config.

environment = {"delay":<delay_variable_from_data>, "BDP": <BDP_value>}
run = wandb.init(config=environment) 

All of the wandb.config is stored with that run on the UI and be used to filter or group runs together.

Here is a quick guide on wandb.config

Let me know if this helps!
Nate

1 Like

Thank you for the suggestion. However, the problem I have with this approach is that I need to create a new dataset for each configuration, or at-least only select samples that are equal to the current enviroment. I have only 1 test dataset and the config values such as delay and BDP are tied to each sample in the dataset. This means that the values such as BDP and delay are different for each sample often times.

So while your approach is feasible if a structure my data better, I would prefer to do everything in one swoop and filter/group the values in retrospect.

I hope this makes sense, please let me know otherwise and I can try to clarify my intention better.

On second thought, It might be worth the effort to create a dataset for each enviroment and run the test mutliple times on the same trained model. I am not sure how this would look in the end though, What if i had 10 different values of delay and I wanted to plot it against the model accuracy for each respective value?

I tried illustrating it below :slight_smile:

Accuracy
   ^
   |
   |
   |
   |
   + ---- --- --- --- --- >
          Delay [ms]

@kevjn I love the illustration!

Ok, I see how this would be pretty hard to do without restructuring the data. I agree that using tables to create something like this is the probably the best solution:

Delay | BDP | Accuracy |
10    |  3  |  .89     |
15    |  4  |  .90     |

Then you should be able to use that Table to generate a chart with delay on the X-axis and accuracy on the Y-axis.

Is this what you are doing now?

Also it sounds like your end goal is not training a model but rather to compare different environments on the same model correct?

Thank you,
Nate

@nathank , I did log everything in a table at first, but I notice it was a bit slow and required some manual labor to make the data presentable. There are about 30 000 rows and I figured that the best way is to create the plots myself in something like matplotlib and log the figures using wandb images.

Regarding question 2)
Exactly, I have training and testing pipeline and in the testing pipeline i want to compare the models accuracy in many different environments. The number of unique configurations explodes in value so i it is not really feasible to start a new run for each configuration.

@kevjn sorry that going with Tables wasn’t working as a great solution for you. Currently we are working on restructuring Tables with the hopes of speeding up logging times.

I’m glad you found a working solution using Matplotlib but if you have any feedback on how Tables could better support this use case feel free to let me know and I can put in a feature request.

Thank you,
Nate

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.