Error handling

I am currently running a sweep and with different configuration for a ResNET model. I noticed i was getting “CUDA OUT OF MEMORY” errors. This is more of a general question, but how can we manually handle wandb.errors specifically "Runtime Errors?

Let’s say I am loading a model and it runs out of memory, or idk like the shape is wrong. Wandb catches these errors, and moves on to either another run instance or tried to do it over and over. Is there a way i can wrap arround a try-except clause.
I tried wrapping my except clause as a Runtime exception, but it seems that it does not catch it.

Example Code:

try:
            model = load_model(wandb.config, pipeline_parameters['model_type'])
except RuntimeError as e::
            print('exception met')
            # del X_train
            # del Y_train
            # del X_val
            # del Y_val
            gc.collect()
            torch.cuda.empty_cache()
            run.finish(exit_code=0)
            return 1

Hi @tanishqgautam! Thank you very much for writing in.

Thank you so much for the example above. Could you please send me a code snippet of the desired way of handling these errors on your side?

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

it seems when we wrap it under wandb error, a wandb.error error is thrown. Is there any way to handle it ?

Thank you for the follow up and apologies for the delay in reply, this thread got buried.
Could you please point me to a full wandb error you are currently running into?

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi Tahsin,

Since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Warmly,
Artsiom