I had a working WandB launch queue and agent configured . Today I updated the kubernetes config to include a Pod Disruption budget for the WandB jobs . During this process the agent helm chart was also ran , and I saw that the image is updated from [0.16.0-state-fix] to [0.16.3] . After this I ran a pipeline to enque a demo experiment to check everything is fine , but it is crashing due to agent issues . From the agent logs I can see:
wandb: ERROR launch: Error running job: Exception when ensuring Kubernetes API key secret: Server disconnected
I have a valid API key . The same key was used to enque the jobs , create queues and also in the agent config .
What has changed ?