Sweep using cpu even default is cuda

hongyf19 · March 21, 2024, 2:03pm

Hi there, I’m new to wandb. As I try to use wandb.sweep to tune hyperparameters, I find the process much slower than when I was debugging the model. When running similar code using wandb, I find the GPU is almost idle, while it can be used up to 100% when not using sweep. I wonder how could this happen, what am I missing?

Specifically, I’m using text8 data to train a CBOW model. I followed the tutorial to organize the functions:

def train(config=None):
    with wandb.init(config=config):
        config = wandb.config

        dataset = Text8Dataset(corpus, word_to_id, context_size=config.context_size)
        dataloader = DataLoader(dataset, config.batch_size, shuffle=True)
        negative_sampler = NegativeSampler(corpus, config.alpha)

        model = CBOW(vocab_size, config.embedding_dim)
        optimizer = build_optimizer(model, config.optimizer, config.lr)

        num_negative_samples = config.num_negative_samples

        for epoch in range(config.num_epochs):
            loss = train_epoch(model, dataloader, optimizer, negative_sampler, num_negative_samples)
            print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss:.4f}")
            wandb.log({"loss": loss, "epoch": epoch})


def train_epoch(model, dataloader, optimizer, negative_sampler, num_negative_samples):
    total_loss = 0
    for context, target in dataloader:
        input_context = torch.transpose(torch.row_stack([context_word for context_word in context]), 0, 1)

        optimizer.zero_grad()
        output = model(input_context)

        loss = negative_sampling_loss(output, target, negative_sampler, num_neg_samples=num_negative_samples)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    
    return total_loss / len(dataloader)

When not using wandb, I simply run this snippet, which runs an epoch within 2 minutes

vocab_size = len(vocab)
embedding_dim = 256 
model = CBOW(vocab_size, embedding_dim)

context_size = 5
batch_size=512
lr = 0.005
num_epochs = 10

num_negative_samples = 10
alpha = 0.75

#data loading
text8_dataset = Text8Dataset(corpus, word_to_id, context_size=context_size)
dataloader = DataLoader(text8_dataset, batch_size=batch_size, shuffle=True)

# training
#criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
#scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
negative_sampler = NegativeSampler(corpus, alpha)

model.train()
for epoch in range(num_epochs):
    loss = train_epoch(model, dataloader, optimizer, negative_sampler, num_negative_samples)
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss/len(dataloader):.4f}")

hongyf19 · March 26, 2024, 2:27am

Update: I have solved this problem, by simply setting

torch.set_default_tensor_type("torch.cuda.FloatTensor")

But honestly, I got no idea why this makes a difference.

artsiom · March 26, 2024, 3:25pm

Hi @hongyf19 ! Thank you for writing in, and I am glad that you were able to resolve your concern.

Seems like torch.set_default_tensor_type("torch.cuda.FloatTensor") explicitly sets the default tensor type to a CUDA tensor (i.e., a tensor that resides in GPU memory). This means that unless specified otherwise, all tensors created subsequently will automatically be CUDA tensors, ensuring they are allocated on the GPU. This is crucial for leveraging GPU acceleration in deep learning models.

hongyf19 · March 27, 2024, 9:40am

Hi @artsiom , thanks for the explanation!
I guess the lesson for beginners (like me ) is that using torch.set_default_device('cuda') may not be enough.

artsiom · March 27, 2024, 2:51pm

Of course! Happy to help

Thats a good lesson. Hopefully this might help more people.

I’ll close this ticket out from our side, but if you have any more questions regarding this topic you are welcome to put them in here.

Topic		Replies	Views
Hugging Face with Sweeps causes Broken pipe W&B Help sweeps	2	868	December 24, 2023
Sweeps + Accelerate (mulit GPU) + Trainer W&B Help sweeps	7	1243	January 3, 2025
Sweep on accelerate script W&B Help sweeps	5	358	June 10, 2024
Sweep in DDP mode W&B Help sweeps , wandb	4	1042	March 6, 2022
Issue with W&B Sweeps and Lightning W&B Help sweeps , wandb	3	59	September 4, 2024

Sweep using cpu even default is cuda

Related topics