I’m able to log a training run with pytorch lightning + wandb based on these instructions in google colab. Here’s a snippet of code I’m running:
wandb_logger = WandbLogger(project="p", entity="e")
trainer = pl.Trainer(
logger=wandb_logger, # W&B integration
..
)
trainer.fit(model)
it outputs the link to the run and I can see all of the stats etc.
However, how can I retrain? If I try re-training with:
trainer = pl.Trainer(
logger=wandb_logger, # W&B integration
..
)
trainer.fit(model)
It doesn’t seem to log a new run. It looks like it doesn’t even log the data to the existing run, it is just completely lost.
If I try to create a new wandb logger before the re-training:
wandb_logger = WandbLogger(project="p", entity="e")
trainer = pl.Trainer(
logger=wandb_logger, # W&B integration
..
)
trainer.fit(model)
it times out after 1 minute with this error:
andb: ERROR Error communicating with wandb process
wandb: ERROR For more info see: https://docs.wandb.ai/library/init#init-start-error
Problem at: /usr/local/lib/python3.7/dist-packages/pytorch_lightning/loggers/wandb.py 406 experiment
---------------------------------------------------------------------------
UsageError Traceback (most recent call last)
<ipython-input-44-6016437e3426> in <module>
----> 1 wandb_logger = WandbLogger(project="p", entity="e")
6 frames
/usr/local/lib/python3.7/dist-packages/wandb/sdk/wandb_init.py in init(self)
717 backend.cleanup()
718 self.teardown()
--> 719 raise UsageError(error_message)
720 assert run_result and run_result.run
721 if run_result.run.resumed:
UsageError: Error communicating with wandb process
For more info see: https://docs.wandb.ai/library/init#init-start-error```
Am I using wandb + pytorch lightning the correct way? What is the expected lifecycle of the wandb logger in relation to the pl training object?