Error while deploying Nvidia Operator to use AI/ML Workloads on TKG Service Clusters:
INSTALLATION FAILED: failed to install CRD crds/nvidia.com_clusterpolicies.yaml: Post "https://10.1x.x.x:6443/apis/apiextensions.k8s.io/v1/customresourcedefinitions?fieldManager=helm": http2: client connection lost"
You may also encounter the following error on some Nvidia Operator pods:
Error: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured.
vSphere with Tanzu 8.x
The NVIDIA Operator has multiple components that require additional Custom Resource Definitions (CRDs) for proper operation.
One of these components is the NVIDIA Container Toolkit, which provides the nvidia-container-runtime needed for the containers to run.
To deploy the Nvidia Operator, access to online repositories to pull the required images and CRDs is required.