Wandb.watch doesnt log anything for me

Hi, despite trying many variations and looking through similar issues including #1197 and #2096, I cannot manage to get wandb to log anything about parameters and gradients using wandb.watch(). The documentation has not been helpful. Here’s a sample of what I’m trying to do:

from torchvision.models import resnet18
import torch

m = resnet18()
m.train()

opt = torch.optim.SGD(m.parameters(),lr=0.1)

loss_fn = torch.nn.BCEWithLogitsLoss()

#track with wandb
wandb_session = wandb.init(
    entity='kitzeslab',
    project="trying wandb in opensoundscape", 
    name='try basic gradient logging',
)


#tried various version of this line, with no criterion argumnet, criterion=loss_fn, etc
# tried both wandb.watch() and wandb_session.watch()
# tried 'all' and 'gradients' for log
wandb_session.watch(models=m,log='all')#,criterion=loss_fn)#,crieterion=torch.nn.BCEWithLogitsLoss)

#train one epoch:

for samples in train_loader:
    #please just trust that this is how I get samples from my dataloader
    samples = collate_samples(samples)
    tensors = samples['samples']
    labels = samples['labels']

    #tried both forward() and __call__()
    logits = m.__call__(tensors) 

    # calculate loss
    loss = loss_fn(logits,labels.float())

    opt.zero_grad()
    # backward pass: calculate the gradients
    loss.backward()
    # update the network using the gradients*lr
    opt.step()
    wandb_session.log({'loss':loss})
    
wandb_session.finish()

The loss logs to wandb fine, but I don’t have panels for Parameters or Gradients. Am I doing something wrong?

The issue was with log_freq, which defaults to 1000(!) When I specified log_freq=1, it logged everything.

Counter-intuitively, logging occurs after 1000 steps. I expected that it would log after the first step, then once every 1000 steps after that.

2 Likes

Hello Sam!

Looks like you found the solution for your issue, which is great! For transparency (and for others who may come across this thread), here is our documentation on watch saying that our default is log_freq: int = 1000. Unfortunately, there is no current way to set the logging to start at the firs step (besides setting log_freq = 1). Would you like me to write in a feature request for logging to start at the first step?

Hello! I wanted to follow up with you regarding your support request as I have not heard back from you. Please let me know if we can be of further assistance or if your issue has been resolved.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.