Unable to log each run when using pytorch lightning integration

tleyden · November 18, 2022, 10:42pm

I’m able to log a training run with pytorch lightning + wandb based on these instructions in google colab. Here’s a snippet of code I’m running:

wandb_logger = WandbLogger(project="p", entity="e")
trainer = pl.Trainer(
    logger=wandb_logger,    # W&B integration
    ..
)
trainer.fit(model)

it outputs the link to the run and I can see all of the stats etc.

However, how can I retrain? If I try re-training with:

trainer = pl.Trainer(
    logger=wandb_logger,    # W&B integration
    ..
)
trainer.fit(model)

It doesn’t seem to log a new run. It looks like it doesn’t even log the data to the existing run, it is just completely lost.

If I try to create a new wandb logger before the re-training:

wandb_logger = WandbLogger(project="p", entity="e")
trainer = pl.Trainer(
    logger=wandb_logger,    # W&B integration
    ..
)
trainer.fit(model)

it times out after 1 minute with this error:

andb: ERROR Error communicating with wandb process
wandb: ERROR For more info see: https://docs.wandb.ai/library/init#init-start-error
Problem at: /usr/local/lib/python3.7/dist-packages/pytorch_lightning/loggers/wandb.py 406 experiment
---------------------------------------------------------------------------
UsageError                                Traceback (most recent call last)
<ipython-input-44-6016437e3426> in <module>
----> 1 wandb_logger = WandbLogger(project="p", entity="e")

6 frames
/usr/local/lib/python3.7/dist-packages/wandb/sdk/wandb_init.py in init(self)
    717                     backend.cleanup()
    718                     self.teardown()
--> 719                 raise UsageError(error_message)
    720             assert run_result and run_result.run
    721             if run_result.run.resumed:

UsageError: Error communicating with wandb process
For more info see: https://docs.wandb.ai/library/init#init-start-error```

Am I using wandb + pytorch lightning the correct way?  What is the expected lifecycle of the wandb logger in relation to the pl training object?

tleyden · November 19, 2022, 12:32am

Actually nevermind, it’s working now! I don’t know what I changed that fixed it while I was trying to debug, maybe I had forgotten to call .finish() on the first run.

Here’s the gist of the code I’m running:

wandb_logger = WandbLogger(project="p", entity="e", log_model=True)
trainer = pl.Trainer(
    logger=wandb_logger,    # W&B integration
    ..
)
trainer.fit(model)
wandb.finish()

I’m able to run the above snippet repeatedly and it creates a new run each time.

system · January 18, 2023, 12:33am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PyTorch Lightning and wandb.init() compatible? W&B Help	4	1077	September 26, 2023
Problems logging Gradients with WandB and Pytorch Lightning W&B Help dashboard	0	39	November 6, 2024
Wandb is not logging pytorch lightning W&B Help	3	911	January 1, 2024
Wandb.watch with PyTorch Lightning not logging W&B Help dashboard , wandb	2	1345	August 9, 2022
Log evaluation to finished run W&B Help wandb	4	1442	September 2, 2023

Unable to log each run when using pytorch lightning integration

Related topics