PyTorch Tensorboard Sync in distributed training experiments

Hi there,

I am trying to log my PyTorch training with w&b in a environment with Tensorboard X integration.

The training is performed via the Pointcept Codebase. This code base already has a Tensorbaord integration. To get w&b logging the training, I followed the Quickstart guide and put the wandb.init() at the beginning of the training script (Find Code below).

My Issue:

If I run the training on only one single gpu, w&b has no problem to sync the Tensorboard logs to the w&b dashbord. If I train on more than one gpu, the w&b dashbord creates the run, but with empty charts.

  • In the System tray, it detects some system information. E.g. it detects the GPU utilization of (all) gpus.
  • In the Logs tray, no logs are recognized (this usually works with one gpu)
  • If i try to spin up the Tensorboard instace: “No dashboards are active for the current data set.”

My Code:

adapted from the Pointcept/tools/ script:

Main Training Script

Author: Xiaoyang Wu (
Please cite our work if the code is helpful to you.

from pointcept.engines.defaults import (
from pointcept.engines.train import TRAINERS
from pointcept.engines.launch import launch

import wandb

def main_worker(cfg):
    cfg = default_setup(cfg)
    trainer =, cfg=cfg))

def main():
    args = default_argument_parser().parse_args()
    cfg = default_config_parser(args.config_file, args.options)
    wandb_cfg = cfg.pop("wandb", None)
    if wandb_cfg: 
        if wandb_cfg.track: 
            import wandb
            settings = wandb.Settings(disable_git=True)
            wandb.tensorboard.patch(root_logdir=cfg.save_path, save=True, tensorboard_x=True)




if __name__ == "__main__":

Hi @rauch - Thanks for reaching out with your question!

Would you mind sharing some additional information on your training environment:

  • Are you running this locally or on a cloud platform (if so, which one)? Which GPUs are you using for the training?
  • What version of wandb SDK have you got installed? What other libraries and frameworks do you also have installed?
  • Are you running it through a Jupyter Notebook?
  • Could you share the debug.log and debug-internal.log for the run training on multiple GPUs? These should be in the ./wandb/run-date_time-runid/logs/ folder

This information will help us investigate what could be causing the data not being properly logged in your case.


Sorry for missing this important details.

I run it on a private GPU cluster with up to 16 GPUs. The GPUs are NVIDIA V100 SXM3 with 32 GB HBM2 memory.

The training is executed from the terminal inside a docker container, not through a Jupiter Notebook. Python Env see below.

As i said,

Driver Version: 470.161.03
CUDA Version: 11.4

python pip list:

Package                   Version
------------------------- -----------------------
zstandard                 0.19.0

W&B Console Output

wandb: Currently logged in as: r**h. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.16.3 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.16.0
wandb: Run data is saved locally in /workspace/Pointcept/wandb/run-20240222_151348-v7bgm46o
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run luminous-orchid-11
wandb: ⭐️ View project at**h/RB3D%20multi
wandb: 🚀 View run at**h/RB3D%20multi/runs/v7bgm46o


Hi @rauch - thank you for your patience while we investigate this. I have not escalated internally and will keep you posted with any updates.

One more piece of information which would be useful to know - did you try running the same training job without logging to W&B on multiple GPUs, and would you be able to visualise the logged data?