I have several finetuning runs evaluating checkpoints of other pretraining runs. Pretraining run A creates checkpoints A-1 and A-2 in epochs 1 and 2. Now I have a finetuning run for each checkpoint of A. I would like to create a chart with on the X-axis the pretraining epoch (I can get this from a config file), not the finetuning epoch, and on the Y-axis a test metric I compute once at the end of every finetuning run. The above is easy to get with a scatter plot, but I would like to be able to group finetuning runs by the pretraining run (so A, that has a number of separate finetuning runs associated with it, not the pretraining checkpoint A-1 for which there is 1 finetuning run)(I have this info also in a config file), so that each line in the resulting line chart has points from different finetuning runs corresponding to different epochs of the same pretraining runs. Is that possible, to group runs in a line chart?
Hi @rubencart thank you for writing in! In the project’s workspace if you edit your line chart ( icon) there is a Grouping
tab and once you toggle on the Runs option there you should be able to see the Group by
drop-down menu with your parameters. Would this work for you? if that’s not what you’re looking for, would it be please possible to share a minimal code snippet that mimics such a workspace, or attach a screenshot of the current plot you have?
I don’t immediately know how I would come up with a code snippet.
But what you suggest does not work. I can add a scatter plot with the epoch of the pretraining checkpoint (stored in hyperparameter) on the x-axis, and my test metric on the y-axis, but there is no ‘Grouping’ tab (see screenshot 1).
Alternatively, I can start a line plot, which has a Grouping tab, but it does not allow to put my hyperparameter denoting the pretraining epoch on the x-axis (note, this are not the epoch values from finetuning runs in the current wandb project, they are epoch numbers from the pretraining (other project) checkpoints that runs in this project (finetuning) initialized their weights from). Additionally, when I select a test metric (instead of a val metric) for the y-axis, the web app automatically turns the chart into a bar chart since all runs only logged 1 value for that metric. See screenshot 2.
Thanks for your help!
Hi @rubencart thanks for the screenshots and the additional information. Just to confirm, to see if that’s possible would it be correct that you want to plot on the x axis finetune.pretrained_epoch
and on y axis the test_id_maj_vote_f1_macro
? Do the test metrics have history values as the val metric? that might explain why you’re seeing a bar instead of the values per step.
Also regarding pretrained/finetuning projects, are these values you want to plot logged in this project? as it won’t be possible in a project workspace to get the values from another project (pre-training). Unless you have logged these values in the finetuning project. The only area where you could have multiple projects would be in the Reports.
That’s correct indeed. They don’t have history values. In the plot that I want, one line would consist of points corresponding to different finetuning runs (different finetune.pretrained_epoch
) starting from different checkpoints (finetune.pretrained_epoch
) of the same pretraining run (the name of the pretraining run is also saved in a hyperparameter to which I have access). The y-values would indeed be test_id_maj_vote_f1_macro
or a similar metric, of which I only have 1 value per finetuning run.
All values I want to plot are logged or saved as hyperparameter in this current finetuning project.
Just to be clear:
- I have some pretraining runs, e.g. A, B, with different checkpoints per pretraining run A.1, A.2, B.1, B.2
- I have, in the current wandb project, several finetuning runs, e.g. f, g, h, k.
- Each finetuning run starts from a pretraining checkpoint. The name of the pretraining run and the epoch number of the checkpoint are both saved in hyperparameters of the finetuning run:
finetune.all_backbone_ckpts_in_dir
andfinetune.pretrained_epoch
resp. - All finetuning runs log a number of test metrics, a single time at the end of their training, e.g.
test_id_maj_vote_f1_macro
(these metrics don’t have a history, I have val metrics with a history too but these I am not interested in for this plot). - Let’s say finetuning run f starts from checkpoint A.1 (so
finetune.all_backbone_ckpts_in_dir == A
andfinetune.pretrained_epoch == 1
), g from A.2, h from B.1 and k from B.2. - Now what I want is a line plot where one line represents finetuning runs f and g, and hence pretraining run A, and connects the point (x:
finetune.pretrained_epoch == 1
, y:test_id_maj_vote_f1_macro
(logged by f)) with (x:finetune.pretrained_epoch == 2
, y:test_id_maj_vote_f1_macro
(logged by g)). The second line connects (x:finetune.pretrained_epoch == 1
, y:test_id_maj_vote_f1_macro
(logged by h)) and (x:finetune.pretrained_epoch == 2
, y:test_id_maj_vote_f1_macro
(logged by k)). - Just to repeat: finetuning run f has in its hyperparameters saved the name of its pretraining run A and the pretraining epoch 1, and it logged the test metric we want to plot.
- The finetuning epochs are irrelevant, what I want to plot is the evolution of the
test_id_maj_vote_f1_macro
, obtained by finetuning on a downstream task, over the course of pretraining (to see e.g. the optimal/minimal necessary number of pretraining epochs).
If either the existing scatter plot interface would allow to group runs according to a hyperparameter (and connect the groups with a line), or if the existing line plot interface would allow to put a hyperparameter on the x-axis and would not automatically turn into a bar chart for metrics without a history, this would be very easy to achieve I guess?
Let me know if anything is unclear.
For the record, these are the types of plots I’m interested in. The different dots on 1 line each correspond to an entire finetuning run.
2nd example. Now the lines represent groups of pretraining runs and hence each dot represents a number of finetuning runs.
@thanos-wandb any thoughts?
Hi @rubencart , following up on this for Thanos as he is out of office. I reviewed the thread, and at this time, it isn’t currently doable for you to be that selective with which data points and from which runs are plotted within a line. The following options are available to produce the desired plots.
-
After completing your training runs, log specific data points of interest to a table and use a weave panel to plot the individual points as lines
-
Produce the line plot outside of wandb using matplot lib and log the graphs as images or charts, more on this here
-
As wandb plots is built over Vega, you could attempt to produce a custom chart with the desired result.
Okay, thank you for the suggestions!
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.