vSphere Kubernetes Cluster Node in "NotReady" State after the cluster upgrade.
search cancel

vSphere Kubernetes Cluster Node in "NotReady" State after the cluster upgrade.

book

Article ID: 393627

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • The Virtual Machine associated with vSphere Kubernetes Cluster is not available in the vSphere Client.
  • When checked from the Supervisor using the command "kubectl get vm -n <namespace>", the vm is not listed.
  • The node is visible when logging into the guest TKC control plane and running the command:

# kubectl get nodes -A

Sample Output:

NAME              STATUS            ROLES        AGE      VERSION
<nodename>        Ready        control-plane     33d   <TKr Version>
<nodename>        Ready        control-plane     42d   <TKr Version>
<nodename>        Ready        control-plane     42d   <TKr Version>
<nodename>        NotReady         <none>        18d   <TKr Version>
<nodename>        Ready            <none>        18d   <TKr Version>
<nodename>        Ready            <none>        18d   <TKr Version>
<nodename>        Ready            <none>        18d   <TKr Version>
<nodename>        Ready            <none>        18d   <TKr Version>

 

  • Reviewing the vmware-system-vmop_vmware-system-vmop-controller-manager logs in the supervisor, confirms that the node is successfully deleted.

    [YYYY-MM-DDTHH:MM:SS] stderr F I0328 20:22:26.271311       1 power_state.go:294] "Hard power op" logger="vsphere" vmName="#-#-ns/#-#-#-#-node-pool-#-#-#" currentPowerState="poweredOn" desiredPowerState="poweredOff" desiredPowerOpBehavior="Hard" powerOpHardFn="PowerOff"
    [YYYY-MM-DDTHH:MM:SS] stderr F I0328 20:22:27.895371       1 virtualmachine_controller.go:288] "Provider Completed deleting Virtual Machine" logger="VirtualMachine" name="#-#-ns/#-#-#-#-node-pool-#-#-#" time="YYYY-MM-DDTHH:MM:SS"
    [YYYY-MM-DDTHH:MM:SS] stderr F I0328 20:22:27.895439       1 virtualmachine_controller.go:295] "Finished Reconciling VirtualMachine Deletion" logger="VirtualMachine" name="#-#-ns/#-#-#-#-node-pool-#-#-#"

Cause

The node was deleted from the Supervisor and the UI, but a stale entry remained in the database of guest cluster control plane.

Resolution

Delete the node from the TKC cluster control plane with the command:

# kubectl delete node <nodename>

Note: The node should only be deleted if it is not visible in vSphere Client and doesn't reflect as virtual machine in supervisor. In case the node is still present in the vSphere Client or in Supervisor, please contact Broadcom support.