Hello Folks, I recently start to use wandb in my work. I tried to run wandb.log() many times in a single epoch with different intervals. It can cause error easily when the steps is not consistent with the internal current step. I found out some answer in github or other forum, But it’s not helpful. I finally find a right way with the help of ChatGPT:
Assume we want to log the avg_loss every epoch, and also the eval_result per 5 epochs, during the eval step, I might need to log the result into different class or different condition, in that setting, we need to write the code like this:
import wandb
wandb.init(project="my-project", entity="my-entity")
loss=1
acc1=0
acc2=0
for i in range(50):
wandb.log({"loss":loss, step=i,commit=False)
if i%5==0:
wandb.log({"acc1":acc, step=i,commit=False)
wandb.log({"acc2":acc, step=i,commit=False)
wandb.log({},comiit=True)
The keypoint here is that set commit=False
during the log stage. The internal steps will only +1 when the log() operation is commited. So We need to make sure that we only commit 1 time in one epoch. otherwise, when the given step is less than internal step, it will report an error and that log record will be dismissed.
And also make sure that run wandb.log({})
as the last wandb.log() within a epoch, because after you submit this, the interal step will bei+1
, if you run something like wandb.log({"acc":acc, step=i,commit=False)
after that, it will also cause the mentioned error.
Just always make sure that the step you set is no less than the internal step and also incremental.