Sweeps ending in just 1 epoch

sporwal1818 · April 15, 2024, 1:07pm

The problem is that, I was trying to perform hyper-parameter sweep using wandb, the first sweep runs for set no. of epochs, but the consecutive sweeps just run for 1 epoch. For proof I attach the image in which you can observe a drastic decrease in runtime as sweeps progress.

Here is my code for performing sweep :


wandb.login()

NAME = sweep_config['parameters']['model_name']['value']+f"__var-{sweep_config['parameters']['num_classes']['value']}"+ \
    f"__fold-{sweep_config['parameters']['fold']['value']}"

print('NAME : ' , NAME , '\n\n')
sweep_id = wandb.sweep(sweep_config, project=NAME)

def tune_hyperparams(config = None):
    with wandb.init(config = config):
        config = wandb.config
        print(config,'\n\n\n')
        num_workers = 8
        tr_loader = DataLoader(tr_dataset, batch_size=config['BATCH_SIZE'], shuffle=True, num_workers=num_workers)
        val_loader = DataLoader(val_dataset, batch_size=config['BATCH_SIZE'], shuffle=False, num_workers=num_workers)

        model_obj = DenseNet(densenet_variant = config['model_size'] , in_channels=config['in_channels'], 
                     num_classes=config['num_classes'] , compression_factor=0.3 , k = 32 , config=config)
        model = Classifier(model_obj)

        run_name = f"lr_{config['lr']} *** bs{config['BATCH_SIZE']} *** decay_{config['weight_decay']}"
        wandb_logger = WandbLogger(project=NAME , name = run_name)


        trainer = Trainer(callbacks=[early_stop_callback, rich_progress_bar], 
                        accelerator = 'gpu' ,max_epochs=config['epochs'], logger=[wandb_logger] , devices=find_usable_cuda_devices(1))  

        trainer.fit(model, tr_loader, val_loader)
    wandb.finish()


wandb.agent(sweep_id, tune_hyperparams, count=30)

Pls tell how to tackle this problem…
Thanks in advance…

system · April 16, 2024, 11:23pm

Hello,

Thank you for contacting support.

To help resolve your issue as efficiently as possible, could you please provide the following information:

A link to the project.
A copy of the code snippet for the callbacks=[early_stop_callback] portion of your code.

Thanks again and I look forward to hearing from you.

Best regards,
Jason

sporwal1818 · April 18, 2024, 2:20am

Thanks for reaching out … Here is the link of wandb project →

Callbacks :

early_stop_callback = EarlyStopping(
   monitor='val_loss',
   min_delta=0.0001,
   patience=20,
   verbose=True,
   mode='min'
)

Isn’t it possible to share project in the private mode with a sharable link as happens with MS-Office packages…?

system · April 22, 2024, 9:23pm

Hello,

Thanks for your kind patience while I investigated the issue. Reviewing the logs I see the following:

29 Monitored metric val_loss did not improve in the last 55 records. Best score: 0.612. Signaling Trainer to stop.

Here is a direct link to the logs within your project page as well.

Wandb will only ever terminate a run early if it is implicitly called using early_terminate within your sweeps config. More info can be found here

This would leave the Pytorch EarlyStopping function as the main culprit. Looking into their docs they reference “events” as “The number of events to wait if no improvement and then stop the training”.

I am not sure what they are counting as an event but the logs indicate the EarlyStopping method is what is ending your runs early.

Let me know if you have any more questions or have more context you would like to share.

Best,
Jason

system · April 25, 2024, 6:08pm

Hi, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Topic		Replies	Views
How to early stop bad runs in sweeps to save time W&B Help sweeps , wandb	5	3764	August 8, 2022
BrokenPipeError when doing sweeps W&B Help sweeps , wandb	5	729	January 22, 2024
Early Terminate Failing with Exit Code 1 W&B Help sweeps	8	1406	December 30, 2023
Elapsed time per epoch much slower for sweep than for individual runs W&B Help sweeps	11	872	July 21, 2023
Sweeps: Waiting for W&B process to finish... (failed 1) W&B Help sweeps , projects , wandb	7	4131	May 31, 2023

Sweeps ending in just 1 epoch

Related topics