Structuring Code For Interruptible session

max_wasserman · September 18, 2021, 1:43pm

Looking to scale up my project to some cloud service, and it seems the prices are much cheaper for interruptible sessions.

How do I use W&B for an experiment (either a single train run or a HP sweep) in such an environment? Is there anything fancy needed to re-start a sweep where it left off?

I’m using Pytorch Lightning for the model/trainer and hoping to use AWS/Grid.ai/other cloud service to scale up.

Thanks,
Max

_scott · September 18, 2021, 8:49pm

Hi Max!

Here’s some documentation on running wandb sweeps on preemptible instances.

I know I sent this on the PyTorch Lightning forum, but hopefully this’ll help people who find this post.

I’d love to hear how you get on experimenting with this and PyTorch Lightning

system · April 20, 2022, 6:02pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue with W&B Sweeps and Lightning: Code Stops After First Run W&B Help sweeps , wandb	7	92	August 28, 2024
Issue with W&B Sweeps and Lightning W&B Help sweeps , wandb	3	65	September 4, 2024
Resuming sweep runs on a cluster with job time limits W&B Help sweeps	8	1877	February 4, 2023
Resume offline Run W&B Help	0	116	January 15, 2025
Clarification on Early Termination (Hyperband) W&B Help	3	655	April 20, 2022

Structuring Code For Interruptible session

Related topics