VKs Cluster Nodes Fail to Delete or Power on After Updating the Deployment
search cancel

VKs Cluster Nodes Fail to Delete or Power on After Updating the Deployment

book

Article ID: 426619

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The following symptoms are experienced VKr workload cluster
- Multiple application and system pods are in a CrashLoopBackOff status
- Node deletion operations hang
- New nodes are provisioned but cannot power on
- Events for the pods are similar to the following:

Warning  FailedAttachVolume  5m6s (x3 over 9m11s)  ############-##########  AttachVolume.Attach failed for volume "###-########-####-####-####-############" : rpc error: code = Internal desc = failed to get CnsFileAccessConfig instance: "#####-###-########"/"". Error: failed to get API group resources: unable to retrieve the complete list of server APIs: cns.vmware.com/v1alpha1: Get "https://supervisor.default.svc:6443/apis/cns.vmware.com/v1alpha1": dial tcp: lookup supervisor.default.svc on ###.###.###.###:53: read udp ###.###.###.###:58191->###.###.###.###:53: read: no route to host

Environment

vSphere with Kubernetes Supervisor
vSphere Kubernetes Release 1.32 and greater

Cause

The Kubernetes master and the KubeDNS pods may be experiencing issues. 

Resolution

Ensure the necessary system pods and containers are running on the cluster. 
See the following guidance for additional information:
Verify the Status of the vSphere with Tanzu Resources for Developer Ready Infrastructure