We recently set our client (v 0.13.0) to use wandb service by default for distributed training. The service addresses the Common Issues users run into. Please update your cli or follow instructions in docs in how to use. Let us know if this resolves your issues.