Failed to create containerd container: CDI device injection failed: unresolvable CDI devices
search cancel

Failed to create containerd container: CDI device injection failed: unresolvable CDI devices

book

Article ID: 437128

calendar_today

Updated On:

Products

VCF Private AI Services

Issue/Introduction

Kubernetes pods using NVIDIA GPUs fail to start. The pod events log the below error:

failed to create containerd container: CDI device injection failed: unresolvable CDI devices

Environment

Private AI Services version 2.1 using GPU Operator v25.10.1

 

Cause

This issue occurs due to a configuration conflict between the NVIDIA GPU Operator and the Containerd runtime.

Resolution

To work around the issue, customise the GPU Operator Helm chart values to disable CDI (Container Device Interface).

In Private AI Services, this can be accomplished by,

  1. Create a ConfigMap like below:
    $ cat helm-values.yaml
    
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: helm-values
      namespace: gpu-40c-ns-g3krm
    data:
      values.yaml: |
        cdi:
          enabled: false
        toolkit:
          version: "v1.18.2"
          env:
            - name: NVIDIA_CONTAINER_RUNTIME_MODE
              value: "legacy"
            - name: CDI_ENABLED
              value: "false"
        # driver:
        #   version: "580.126.09"
        operator:
          logging:
            level: debug
  2. Apply the ConfigMap.
    $ kubectl -n my-namespace apply -f helm-values.yaml
  3. Add the below config to the spec section in PAISConfiguration resource within the same namespace.
    $ kubectl edit paisconfiguration default -n my-namespace
    
    spec:
      nvidiaConfig:
        gpuOperatorOverridesRef:
          name: helm-values