Hugging Face Accelerate + Sweeps

harshil · March 1, 2023, 12:24pm

Hi,

I am struggling to get sweeps to work with Hugging Face’s Accelerate library. Specifically, the first run of the sweep works fine, but every run thereafter fails due to re-initialising the Accelerator for every run. In every run from the 2nd, I get the error: AcceleratorState has already been initialized and cannot be changed, restart your runtime completely and pass mixed_precision='bf16' to Accelerate().

Below is a minimal example of a script which I’m launching using accelerate launch. I’d appreciate any suggestions. Thanks!

import os
from typing import Any, List, Tuple

from accelerate import Accelerator
from torch import Tensor
from torch.utils.data import Dataset, DataLoader
from transformers import (
    Adafactor,
    PreTrainedTokenizerFast,
    T5ForConditionalGeneration,
    T5TokenizerFast,
)
import wandb


class TestDataset(Dataset[Any]):
    def __init__(self, tokenizer: PreTrainedTokenizerFast) -> None:
        super().__init__()
        self._str_prompt = "This is a "
        self._str_target = "test."
        
        self._tokenizer = tokenizer
    
    def __len__(self) -> int:
        return 1

    def __getitem__(self, idx: int) -> Tuple[str, str]:
        return self._str_prompt, self._str_target
    
    def collate(self, batch: List[Tuple[str, str]]) -> Tuple[Tensor, Tensor]:
        prompts = [b[0] for b in batch]
        targets = [b[1] for b in batch]
        
        prompts_tokenized = self._tokenizer(prompts, return_tensors="pt")
        targets_tokenized = self._tokenizer(targets, return_tensors="pt")
        
        return prompts_tokenized["input_ids"], targets_tokenized["input_ids"]


def main() -> None:
    accelerator = Accelerator(log_with="wandb", mixed_precision="bf16")
    
    if accelerator.is_main_process:
        accelerator.init_trackers(os.environ.get("WANDB_PROJECT"))
    
    accelerator.wait_for_everyone()
    
    wandb_tracker = accelerator.get_tracker("wandb")
    multiplier = wandb_tracker.config["multiplier"]
    
    model = T5ForConditionalGeneration.from_pretrained("t5-small")
    tokenizer = T5TokenizerFast.from_pretrained("t5-small")
    opt = Adafactor(params=model.parameters())
    
    dataset = TestDataset(tokenizer=tokenizer)
    data_loader = DataLoader(dataset=dataset, collate_fn=dataset.collate)
    
    model, opt, data_loader = accelerator.prepare(model, opt, data_loader)
    
    input_ids, labels = next(iter(data_loader))
    
    loss = model(input_ids=input_ids, labels=labels).loss
    
    loss_gathered = accelerator.gather_for_metrics(loss).mean()
    accelerator.log({"loss": loss_gathered.item() * multiplier})
    
    accelerator.end_training()


if __name__ == "__main__":
    sweep_configuration = {
        "method": "random",
        "metric": {"goal": "maximize", "name": "loss"},
        "parameters": {"multiplier": {"values": list(range(100))}},
    }
    
    sweep_id = wandb.sweep(
        sweep=sweep_configuration,
        project=os.environ.get("WANDB_PROJECT"),
    )
    wandb.agent(sweep_id, function=main, count=3)

luis_bergua · March 3, 2023, 12:33pm

Hi @harshil, thanks for reporting this! I’ve tested your code in this colab and it seems to be working properly for me. Could you try upgrading wandb, accelerate and transformers to the latestversion? Also, would it be possible for you to share with me the debug files under your local wandb folder and so I can have a look at them and see what’s hapening here? Thanks!

luis_bergua · March 7, 2023, 10:34am

Hi @harshil, I just wanted to follow up here! Would it be possible for you to try upgrading wandb, accelerate and transformers to the latest version? In case the issue is still raising, would it be possible to share the debug files under your local wandb folder? Thanks!

harshil · March 7, 2023, 11:05am

Hi @luis_bergua, thanks for your reply! Actually I also shared this issue in the Accelerate repo and was advised that the Accelerator must be instantiated outside the main function, which then worked for me. So it’s interesting that it worked for you without doing so…

However I will indeed try upgrading all libraries to the latest version and get back to you on this

luis_bergua · March 7, 2023, 11:29am

Hi @harshil, great to hear this worked for you! Yes feel free to reach out to me if you need something else.

harshil · March 9, 2023, 2:10pm

Hi @luis_bergua,

Just to let you know, with accelerate-0.16.0, transformers-4.26.1 & wandb-0.13.11 I still get the same issue as above - AcceleratorState has already been initialized and cannot be changed, restart your runtime completely and pass mixed_precision='bf16' to Accelerate().

It works when instantiating the Accelerator outisde the main function.

harshil · March 9, 2023, 2:15pm

@luis_bergua Related to this, I’m having an issue running on multiple GPUs. I would like to run a sweep where each run of the sweep uses all the GPUs on my machine (i.e. I do not want to parallelise over GPUs).

The issue is that when I try to run this with e.g. 2 GPUs, W&B actually creates 2 sweeps and in total, twice as many runs are performed as I requested with count. So e.g. with the script above where I specified count=3, I get 2 sweeps each with 3 runs, rather than just 1 sweep.

Is there a way around this where instead I only get one sweep, and each run uses all the GPUs?

My accelerate config is as follows, in case it’s useful:

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
use_cpu: false

system · May 8, 2023, 2:16pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sweep on accelerate script W&B Help sweeps	5	365	June 10, 2024
Sweeps + Accelerate (mulit GPU) + Trainer W&B Help sweeps	7	1260	January 3, 2025
What is the official way to run a wandb sweep with hugging face (HF) transformers? W&B Help sweeps	7	2263	September 4, 2023
How to correctly use wandb hyperparameter tuning with Huggingface? W&B Help sweeps , wandb	6	1361	June 27, 2023
WandB login problem when i use the Huggingface accelerator on a TPU runtime W&B Help wandb	12	1048	April 16, 2022

Hugging Face Accelerate + Sweeps

Related topics