Logging values to ongoing run from a different process

Hi there!
In my use case I’m running a training loop and storing a model at a regular interval. During training, I also want to evaluate metrics such as the CIDEr score for image captioning. The problem is, computing these metrics takes a lot of time (~40 minutes), and the training is running on a cluster where I can’t evaluate the metrics for several reasons.

So my plan is to load the stored models on a separate machine after every update, and evaluate the metrics there. Once done, I would like to log the metrics to the ongoing training runs, with a step parameter set to the time when the model was stored. So by the time the evaluation is finished, the training runs will have progressed in steps.

Is this possible using the wandb api, without getting concurrency problems?

Thanks!

@dhansmair ,

Thank-you for writing in with your support question. At this time you cannot update/log new metrics of an active run, however you can do the following:

  • Once a run has finished, update its metrics, see here

Please let me know if you have additional questions.

Regards,

Mohammad

Hello Mohammad,
thanks for the reply. I see. It does not help directly but may be useful for me still, thank you. Meanwhile I managed to circumvent the problem by moving to a different server.

Best,
David

Hello @dhansmair ,

Thank-you for letting me know you resolved your issued. I will mark this closed.

Regards,

Mohammad

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.