How often to log to avoid slow down of code?

Just curious, do ppl sync artifacts with wandb’s cloud stuff every time they log or less often? I was curious to know if calling wandb.log or logging artifacts was slow or does it use a different process and thus slow down is minimum? … (in the past I discovered that the slowest part of my code was model checkpointing)

2 Likes

The hard work of wandb.log runs in a different process, so that it doesn’t always slow down your code.

The rough guideline we give in our Technical FAQ is that you shouldn’t see a performance impact if you are “logging less than once a second and logging less than a few megabytes of data at each step”. That Technical FAQ has lots more useful information about the technical details of wandb logging.

Two caveats on avoiding slowdown:

  1. If you’re writing to disk (e.g. with torch.save) as part of your checkpointing, that might slow your training down, even though the W&B logging component of checkpointing does not. If that’s a bottleneck, you could handle writing the model to disk in a separate process.
  2. wandb.log runs in a separate process, but that process needs to finish before the process that created it, the process where you called wandb.init, can finish. So if you’ve called wandb.log on a large amount of data during training, when your training run finishes, it could take some time for the information to be fully uploaded. You can avoid this with WANDB_MODE=offline, which only saves information locally – synchronization with the cloud service is then done separately (e.g. manually or via cron job).

FYI, we also have a short colab about ways to reduce the weight of high-frequency logging for metric data using downsampling, summary statistics, and histograms.

2 Likes