Just curious, do ppl sync artifacts with wandb’s cloud stuff every time they log or less often? I was curious to know if calling wandb.log
or logging artifacts was slow or does it use a different process and thus slow down is minimum? … (in the past I discovered that the slowest part of my code was model checkpointing)
2 Likes
The hard work of wandb.log
runs in a different process, so that it doesn’t always slow down your code.
The rough guideline we give in our Technical FAQ is that you shouldn’t see a performance impact if you are “logging less than once a second and logging less than a few megabytes of data at each step”. That Technical FAQ has lots more useful information about the technical details of wandb
logging.
Two caveats on avoiding slowdown:
- If you’re writing to disk (e.g. with
torch.save
) as part of your checkpointing, that might slow your training down, even though the W&B logging component of checkpointing does not. If that’s a bottleneck, you could handle writing the model to disk in a separate process. -
wandb.log
runs in a separate process, but that process needs to finish before the process that created it, the process where you calledwandb.init
, can finish. So if you’ve calledwandb.log
on a large amount of data during training, when your training run finishes, it could take some time for the information to be fully uploaded. You can avoid this withWANDB_MODE=offline
, which only saves information locally – synchronization with the cloud service is then done separately (e.g. manually or via cron job).
FYI, we also have a short colab about ways to reduce the weight of high-frequency logging for metric data using downsampling, summary statistics, and histograms.
3 Likes
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.