Error handling

tahsin_43 · June 13, 2024, 7:36am

I am currently running a sweep and with different configuration for a ResNET model. I noticed i was getting “CUDA OUT OF MEMORY” errors. This is more of a general question, but how can we manually handle wandb.errors specifically "Runtime Errors?

Let’s say I am loading a model and it runs out of memory, or idk like the shape is wrong. Wandb catches these errors, and moves on to either another run instance or tried to do it over and over. Is there a way i can wrap arround a try-except clause.
I tried wrapping my except clause as a Runtime exception, but it seems that it does not catch it.

Example Code:

try:
            model = load_model(wandb.config, pipeline_parameters['model_type'])
except RuntimeError as e::
            print('exception met')
            # del X_train
            # del Y_train
            # del X_val
            # del Y_val
            gc.collect()
            torch.cuda.empty_cache()
            run.finish(exit_code=0)
            return 1

artsiom · June 17, 2024, 4:44pm

Hi @tanishqgautam! Thank you very much for writing in.

Thank you so much for the example above. Could you please send me a code snippet of the desired way of handling these errors on your side?

artsiom · June 20, 2024, 6:02pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

tahsin_43 · June 20, 2024, 7:41pm

it seems when we wrap it under wandb error, a wandb.error error is thrown. Is there any way to handle it ?

artsiom · June 27, 2024, 4:17pm

Thank you for the follow up and apologies for the delay in reply, this thread got buried.
Could you please point me to a full wandb error you are currently running into?

artsiom · July 2, 2024, 5:02pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

artsiom · July 8, 2024, 6:07pm

Hi Tahsin,

Since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Warmly,
Artsiom

Topic		Replies	Views
Allow unlimited failed runs W&B Help sweeps	2	648	July 28, 2023
Debug error with wandb W&B Help	4	1078	June 18, 2023
429 error and OOM memory error W&B Help	3	56	August 19, 2024
Wandb sweeep training error W&B Help sweeps , wandb , beginner-friendly	3	485	March 7, 2023
Hyperparameter sweep on kaggle notebooks using Wandb fails to complete Show the Community! wandb	6	777	September 20, 2021

Error handling

Related topics