Hi everyone,
I am trying to sweep my hyperparameters for my TensorFlow model. I am using Bayes as the sweeping method.
In my train()
function, I have several .fit()
methods as I am training a progressive GAN and I am required to call model.fit()
several times.
After completing the first model.fit()
successfully, the error
Waiting for W&B process to finish... (failed 1)
What should I do?
I followed this tutorial here to use Wandb sweeps: Google Colab
And here is a look at my sweep_train()
function:
def sweep_train(config_defaults=None):
# Initialize wandb with a sample project name
run = wandb.init(config=config_defaults, resume=True)
pgan = PGAN(latent_dim = NOISE_DIM, d_steps = wandb.config.D_STEPS)
cbk = GANMonitor(num_img = NUM_IMGS_GENERATE, latent_dim = NOISE_DIM)
cbk.set_steps(steps_per_epoch = STEPS_PER_EPOCH, epochs = wandb.config.EPOCHS) # 110, 6
cbk.set_prefix(prefix='0_init')
train(wandb.config.G_LR, wandb.config.D_LR, wandb.config.R_LR, wandb.config.EPOCHS, wandb.config.D_STEPS, cbk, pgan)
run.finish()
Hello @aryamohan23!
Is there more details to the traceback such as right before Waiting for W&B process to finish... (failed 1)
? There should be a traceback into the code detailing what error is causing the project to fail. W&B will also upload all runs regardless of status (success, failed, crashed, etc.) so the (failed 1)
message is indicating the state of the run.
Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.
Hello, sorry for the delay. I get this error:
wandb: ERROR Dropped streaming file chunk (see wandb/debug-internal.log)
I realized it was a network issue and so i set my wandb to offline mode. But even when I try to sync it with a wandb sync
i get the same error.
Hello Arya!
Could you provide the debug.log
and debug-internal.log
fore the run? They should be located in the wandb
folder in the same directory as where the script was run. The wandb
folder has folders formatted as run-DATETIME-ID
associated with a single run. Could you retrieve the debug.log
and debug-internal.log
files from one of these folders specifically from the run that is having issues?
Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.