Sweeps: Waiting for W&B process to finish... (failed 1)

aryamohan23 · March 7, 2023, 3:01am

Hi everyone,
I am trying to sweep my hyperparameters for my TensorFlow model. I am using Bayes as the sweeping method.

In my train() function, I have several .fit() methods as I am training a progressive GAN and I am required to call model.fit() several times.

After completing the first model.fit() successfully, the error

Waiting for W&B process to finish... (failed 1)

What should I do?

I followed this tutorial here to use Wandb sweeps: Google Colab

And here is a look at my sweep_train() function:

def sweep_train(config_defaults=None):

    # Initialize wandb with a sample project name
    run = wandb.init(config=config_defaults, resume=True)  
    
    pgan = PGAN(latent_dim = NOISE_DIM, d_steps =  wandb.config.D_STEPS)
    

    cbk = GANMonitor(num_img = NUM_IMGS_GENERATE, latent_dim = NOISE_DIM)

    cbk.set_steps(steps_per_epoch = STEPS_PER_EPOCH, epochs = wandb.config.EPOCHS) # 110, 6
    cbk.set_prefix(prefix='0_init')

    
    
    train(wandb.config.G_LR, wandb.config.D_LR, wandb.config.R_LR, wandb.config.EPOCHS, wandb.config.D_STEPS, cbk, pgan)

    run.finish()

raphael-sanandres · March 9, 2023, 9:27pm

Hello @aryamohan23!

Is there more details to the traceback such as right before Waiting for W&B process to finish... (failed 1)? There should be a traceback into the code detailing what error is causing the project to fail. W&B will also upload all runs regardless of status (success, failed, crashed, etc.) so the (failed 1) message is indicating the state of the run.

raphael-sanandres · March 14, 2023, 11:03pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

aryamohan23 · March 15, 2023, 10:50am

Hello, sorry for the delay. I get this error:

wandb: ERROR Dropped streaming file chunk (see wandb/debug-internal.log)

I realized it was a network issue and so i set my wandb to offline mode. But even when I try to sync it with a wandb sync i get the same error.

raphael-sanandres · March 20, 2023, 9:25pm

Hello Arya!

Could you provide the debug.log and debug-internal.log fore the run? They should be located in the wandb folder in the same directory as where the script was run. The wandb folder has folders formatted as run-DATETIME-ID associated with a single run. Could you retrieve the debug.log and debug-internal.log files from one of these folders specifically from the run that is having issues?

raphael-sanandres · March 23, 2023, 11:45pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

aryamohan23 · April 1, 2023, 1:23pm

Hello, sorry to come back to this late.
I seem to have resolved this one issue, it seems to just a VPN issue.
However, now my logged images are not reflectibng on the ‘Charts’ tab on my dashboard, even though all my images can be seen in the ‘files’ tab.
The run also shows as ‘crashed’ even though the files are logged.

Attached the debug and debug-internal files:
debug-internal
debug

Thanks in advance

system · May 31, 2023, 1:23pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Waiting for W&B process to finish… (success) W&B Help sweeps , wandb	6	921	January 12, 2024
Hugging Face with Sweeps causes Broken pipe W&B Help sweeps	2	876	December 24, 2023
Broken Pipe error W&B Help sweeps , wandb	2	1798	February 9, 2024
Hyperparameter sweep on kaggle notebooks using Wandb fails to complete Show the Community! wandb	6	777	September 20, 2021
Agent bug? File not found error W&B Help sweeps , wandb	11	5508	May 31, 2022

Sweeps: Waiting for W&B process to finish... (failed 1)

Related topics