Wandb launch-agent doesn't haveGPU support

qisenskaist · February 20, 2024, 1:45am

Hi
In the cond aenvirionment of torch, torchvision, and nvidia-smi is working properly. And standalone py scripts are working fine. But launching those scripts are in the queue, and the launch-agent tries to initiate docker file, but facing GPU errors as below.

Traceback (most recent call last):
File “train.py”, line 130, in
train(defaults)
File “train.py”, line 80, in train
images, labels = images.to(config.device), labels.to(config.device)
File “/env/lib/python3.7/site-packages/torch/cuda/init.py”, line 217, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from Official Drivers | NVIDIA

qisenskaist · February 20, 2024, 3:13am

closing this after having couple of b ack and forths:

install nvidia container toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
and put “gpus”:“all” option in que configuration

Topic		Replies	Views
Launch-agent crash without trace or error log W&B Help wandb	6	276	May 29, 2024
Help with running a sweep agent on a multi-gpu machine with pytorch DistributedDataParallel W&B Help sweeps	4	723	January 8, 2025
SLURM and Launch-agent W&B Help artifacts , wandb	6	440	March 13, 2025
Accelerate launch and WandB agent , run the main function 4 seperate times for 4 GPUS W&B Help sweeps , wandb	3	1636	April 9, 2023
How do I select a GPU before running a wandb agent? W&B Help sweeps , wandb	10	3167	June 4, 2023

Wandb launch-agent doesn't haveGPU support

Related topics