Structuring Code For Interruptible session

Looking to scale up my project to some cloud service, and it seems the prices are much cheaper for interruptible sessions.

How do I use W&B for an experiment (either a single train run or a HP sweep) in such an environment? Is there anything fancy needed to re-start a sweep where it left off?

I’m using Pytorch Lightning for the model/trainer and hoping to use AWS/ cloud service to scale up.



Hi Max!

Here’s some documentation on running wandb sweeps on preemptible instances.

I know I sent this on the PyTorch Lightning forum, but hopefully this’ll help people who find this post.

I’d love to hear how you get on experimenting with this and PyTorch Lightning :zap: