Kubernete node keep flapping" between Ready and NotReady status in vSphere Kubernetes Service cluster (VKS)

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The cluster node keep getting into NotReady state.

# kubectl get node -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cp-name-hndtr-mqn4m Ready control-plane 27d v1.33.3+vmware.1-fips 10.244.0.54 <none> VMware Photon OS/Linux 6.1.148-1.ph5 containerd://2.0.6+vmware.1-fips
worker-name-np-6ora-sjp8qbk Ready <none> 27d v1.33.3+vmware.1-fips 10.#.#.55 <none> VMware Photon OS/Linux 6.1.148-1.ph5 containerd://2.0.6+vmware.1-fips
worker-name-np-6ora-sjwrz65 NotReady <none> 27d v1.33.3+vmware.1-fips 10.#.#.50 <none> VMware Photon OS/Linux 6.1.148-1.ph5 containerd://2.0.6+vmware.1-fips
worker-name-np-6ora-sjx4txq Ready <none> 21d v1.33.3+vmware.1-fips 10.#.#51 <none> VMware Photon OS/Linux 6.1.148-1.ph5 containerd://2.0.6+vmware.1-fips
The describe of the NotReady node will show event that " Kubelet stopped posting node status"

Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Thu, 26 Feb 2026 21:33:08 -0700 Thu, 26 Feb 2026 21:34:43 -0700 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Thu, 26 Feb 2026 21:33:08 -0700 Thu, 26 Feb 2026 21:34:43 -0700 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Thu, 26 Feb 2026 21:33:08 -0700 Thu, 26 Feb 2026 21:34:43 -0700 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Thu, 26 Feb 2026 21:33:08 -0700 Thu, 26 Feb 2026 21:34:43 -0700 NodeStatusUnknown Kubelet stopped posting node status.
The kubectl top node will show no status for CPU and Memory usages.

# kubectl top node

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
cp-name-hndtr-mqn4m 528m 27% 1488Mi 108%
worker-name-np-6ora-sjp8qbk 81m 4% 1199Mi 87%
worker-name-np-6ora-sjx4txq 77m 3% 1131Mi 82%
worker-name-np-6ora-sjwrz65 <unknown> <unknown> <unknown> <unknown>
shh to the problematic node is faling or Intermediate faling.
capi keeps marking the node as unhealthy and the node will get recreated if it didn't got back to Ready state before reaching the Ready=False timeout threshold set on the MachineHealthCheck

+++var/log/pods/svc-tkg-domain-c142716_capi-controller-manager-685d788c7d-qrqdc_######-#######-05062c3afa90/manager/0.log+++

2026-02-17T20:09:18.849295755Z stderr F I0217 20:09:18.849132 1 recorder.go:104] "Machine cluster-name-lnw/worker-name-np-6ora-sjwrz65 has unhealthy Node " logger="events" type="Normal" object={"kind":"Machine","namespace":"cluster-ns","name":"worker-name-np-6ora-sjwrz65","uid":"a8015209-####_#####-556447f8583a","apiVersion":"cluster.x-k8s.io/v1beta1","resourceVersion":"40870933"} reason="DetectedUnhealthy"
2026-02-17T20:09:18.872372531Z stderr F I0217 20:09:18.872282 1 recorder.go:104] "Machine cluster-name-lnw/worker-name-np-6ora-sjwrz65 has unhealthy Node " logger="events" type="Normal" object={"kind":"Machine","namespace":"cluster-ns","name":"worker-name-np-6ora-sjwrz65","uid":"a8015209-####_#####-556447f8583a","apiVersion":"cluster.x-k8s.io/v1beta1","resourceVersion":"40870938"} reason="DetectedUnhealthy"
The node is not showing any Memory or CPU high usages from the vCenter side.
Search the virtual machines based on IP address in vCenter will show another virtual machine with the same iP address.
- Log in to your vCenter Server using the vSphere Client web interface.
- Use the global search bar at the top of the interface and simply type the IP address.
- The search results should filter and display the matching VM/VMs.

Environment

VMware vSphere Kubernetes Service

Cause

The VKS cluster node enters a NotReady state during an IP conflict because the Kubelet (the agent on the node) can no longer maintain a reliable heartbeat with the kube-apiserver

Resolution

Correct the IP address duplication.

Additional Information

If searching the virtual machines based on IP address in vCenter did not show any duplicate IP address you can try to the following steps to confirm if the node getting into NotReady state is due to a duplicate IP address or not .

Disconnect the nic for the virtual machine (of the problematic node that in NotReady state) using the VMware Host Client
1. Log in: Open the VMware Host Client by entering the ESXi host IP address in a web browser.
2. Locate VM: Click on Virtual Machines in the navigator pane and select the target VM.
3. Edit Settings: Click Edit in the top menu bar.
4. Toggle NIC State:
  - Expand the Network Adapter section.
  - Disconnect : uncheck the Connected box.
5. Save: Click Save.
Check if you can ssh or ping the problematic node IP address.
- if you can then Correct the IP address duplication.
- If can not then this is not a duplicate IP address issue, and you will need to login from the virule machine console to invistigate what causing the node to get into NotReady state.