Sweeps: Waiting for W&B process to finish... (failed 1)

Hi everyone,
I am trying to sweep my hyperparameters for my TensorFlow model. I am using Bayes as the sweeping method.

In my train() function, I have several .fit() methods as I am training a progressive GAN and I am required to call model.fit() several times.

After completing the first model.fit() successfully, the error

Waiting for W&B process to finish... (failed 1)

What should I do?

I followed this tutorial here to use Wandb sweeps: Google Colab

And here is a look at my sweep_train() function:

def sweep_train(config_defaults=None):

    # Initialize wandb with a sample project name
    run = wandb.init(config=config_defaults, resume=True)  
    
    pgan = PGAN(latent_dim = NOISE_DIM, d_steps =  wandb.config.D_STEPS)
    

    cbk = GANMonitor(num_img = NUM_IMGS_GENERATE, latent_dim = NOISE_DIM)

    cbk.set_steps(steps_per_epoch = STEPS_PER_EPOCH, epochs = wandb.config.EPOCHS) # 110, 6
    cbk.set_prefix(prefix='0_init')

    
    
    train(wandb.config.G_LR, wandb.config.D_LR, wandb.config.R_LR, wandb.config.EPOCHS, wandb.config.D_STEPS, cbk, pgan)

    run.finish()

Hello @aryamohan23!

Is there more details to the traceback such as right before Waiting for W&B process to finish... (failed 1)? There should be a traceback into the code detailing what error is causing the project to fail. W&B will also upload all runs regardless of status (success, failed, crashed, etc.) so the (failed 1) message is indicating the state of the run.

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hello, sorry for the delay. I get this error:

wandb: ERROR Dropped streaming file chunk (see wandb/debug-internal.log)

I realized it was a network issue and so i set my wandb to offline mode. But even when I try to sync it with a wandb sync i get the same error.

Hello Arya!

Could you provide the debug.log and debug-internal.log fore the run? They should be located in the wandb folder in the same directory as where the script was run. The wandb folder has folders formatted as run-DATETIME-ID associated with a single run. Could you retrieve the debug.log and debug-internal.log files from one of these folders specifically from the run that is having issues?

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.