Kubernetes pods using NVIDIA GPUs fail to start. The pod events log the below error:
failed to create containerd container: CDI device injection failed: unresolvable CDI devices
Private AI Services version 2.1 using GPU Operator v25.10.1
This issue occurs due to a configuration conflict between the NVIDIA GPU Operator and the Containerd runtime.
To work around the issue, customise the GPU Operator Helm chart values to disable CDI (Container Device Interface).
In Private AI Services, this can be accomplished by,
ConfigMap like below:$ cat helm-values.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: helm-values
namespace: gpu-40c-ns-g3krm
data:
values.yaml: |
cdi:
enabled: false
toolkit:
version: "v1.18.2"
env:
- name: NVIDIA_CONTAINER_RUNTIME_MODE
value: "legacy"
- name: CDI_ENABLED
value: "false"
# driver:
# version: "580.126.09"
operator:
logging:
level: debug$ kubectl -n my-namespace apply -f helm-values.yaml$ kubectl edit paisconfiguration default -n my-namespace
spec:
nvidiaConfig:
gpuOperatorOverridesRef:
name: helm-values