Hi,
I’m currently working on a self-supervised representation learning project, and to evaluate the quality of my models I train a linear classifier on the outputs of my (frozen) trained encoder and look at the downstream classification accuracy.
This evaluation procedure is done separately from the training of the encoder, however is there still a way to add the metrics computed during this evaluation phase to the standard metrics I log during the training phase, in the same run panel?
More generally, can I add metrics to a run that is already finished?
Hi @mohammadbakir, thanks a lot for your reply! In fact, my problem is a bit different, let me try to describe it better:
First I have a training phase during which metrics are plot, e.g. loss, etc. Let’s say I have three runs version_0, version_1 and version_2. For each metric I therefore have one chart in which there is one curve for each run. For example, I have a panel loss containing 3 curves, one for each of the version
Then, I have a second phase, during which new metrics are computed, e.g. downstream accuracy. What I’d like to do is therefore to have a new panel in my project called accuracy also with 3 curves, one for version_0, version_1 and version_2. Moreover, these metrics should really be added to the same run created during the training phase, so that e.g metrics computed during both phases are displayed together on the params tables, both curves should become invisible if I untick the run, etc.
However the tricky part here is that I really cannot compute the accuracy on-the-fly during the training phase since it’s self-supervised learning, and if I would just log accuracy afterwards it would create a new run instead of adding the metric to the existing one, which would make things quite unreadable.
I was therefore wondering if there is a way to somehow “reload” a wandb logger for a specific run so that I can add the new metrics to this run even if it’s already finished.
Hi @ari0u , appreciate your your additional feedback.
This approach of first logging , loss, to a run, then revisiting/resuming a run to log different metric, accuracy, starting from step zero again is not supported. The wandb logging step must be monotonically increasing in each call, otherwise the step value is ignored during your call to log(). Now if you are not interested in logging accuracy at step 0, you could resume the previously finished run using its un id and log additional metrics to the run. this however is problematic as the new metric is logged starting at the last known/registered step for the run.
One approach to get around the issue you are running into is to assign each of the runs to a specific group. Example set group = version_0 for any runs that logs metrics for this specific version of the model. You could then set grouping in the workspace to help with tracking the different metrics for each experiment, see this example workspace.
Hope this helps and please let us know if you have additional questions.
Hi,
Thanks again for your help, it’s a bit sad that my issue cannot be solved as is but the workaround you suggested with the groups looks fine.
Do you have any idea why logging has to be monotonically increasing in W&B? Is there fundamental implementation constraints making it impossible, or is it just not implemented for now and could eventually be supported one day? Should I consider sending a feature request?
Hi @ari0u, I’m jumping in here as Mohammad is out now. If you’d like to log metrics to a previous step I can create a new feature request for this. Could you please share with me a little bit about your use-case? Thanks!
Hi,
That would be great, yes! My use case is to log so-called “downstream accuracy” when working on self-supervised representation learning methods. The point of those methods is basically to train a neural network to learn semantic representations of some input data, i.e. map each data sample from a dataset to a vector in a latent space, in an unsupervised way, so without using any labels. The output vector corresponding to an input data is called its representation.
Then, a common practice to evaluate the quality of the produced representations is to train a linear classifier with these representations as inputs on a supervised classification task. The idea here is that if the original encoder is able to produce semantically relevant representations, then classes are likely to be linearly separable. For example, for images, that would mean that e.g. all dogs from the dataset are mapped to vectors that are close from each other in the latent space, but far from the vectors that represent planes.
However, because of the self-supervised setting, training the encoder that produces the representations and the linear classifier that learns to separate the classes are two different phases that are run independently, it is really a posteriori evaluation. Therefore to log the values in wandb I would need to be able to add to wandb the downstream accuracy metric after the training and log the values for this metric on past steps.
In terms of wandb features, what I would require is to be able to use wandb.log(..., step=my_step) even if my_step is a past step. Since I know when I have to log past values, there could be a kwarg in wandb.init like enable_log_past_steps, which can be set to True to enable what I want to do but defaults to False to keep the current behaviour unless stated otherwise.
Since self-supervised learning is becoming a more and more active field of research, I’m sure that such feature would benefit to many people. Please do not hesitate to tell me if something is unclear in my explanations or if you want more details about the usage.
That could be kind a solution, but if I update the config that would mean that downstream accuracy would be a parameter instead of a metric, right? Then I’d lose cool features like logging it for different epochs, or use it for computing parameter importances, etc.
Thank you very much @ari0u, this is a great feedback and I’ll submit internally. Regarding your last question, this is correct, the accuracy would then be a parameter under the config and not a metric.