Wandb launch-agent doesn't haveGPU support

Hi
In the cond aenvirionment of torch, torchvision, and nvidia-smi is working properly. And standalone py scripts are working fine. But launching those scripts are in the queue, and the launch-agent tries to initiate docker file, but facing GPU errors as below.

Traceback (most recent call last):
File “train.py”, line 130, in
train(defaults)
File “train.py”, line 80, in train
images, labels = images.to(config.device), labels.to(config.device)
File “/env/lib/python3.7/site-packages/torch/cuda/init.py”, line 217, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from Official Drivers | NVIDIA

closing this after having couple of b ack and forths:

install nvidia container toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
and put “gpus”:“all” option in que configuration

1 Like