Kube-VIP Not Assigned to Control Plane Node in TKG Cluster — etcd Logs Show "msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT"
search cancel

Kube-VIP Not Assigned to Control Plane Node in TKG Cluster — etcd Logs Show "msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT"

book

Article ID: 404223

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid Management

Issue/Introduction

In a Tanzu Kubernetes Grid (TKG) cluster, the Kubernetes API server becomes unreachable because kube-vip is not assigned to control plane node. This results in failed etcd operations, frequent container restarts, and overall control plane instability.

Symptoms include:

  • API server inaccessible

  • etcd and kube-apiserver containers in a exited state or will have multiple restarts

  • Commands like etcdctl fail due to API unavailability

    To verify:

    1. SSH into a control plane VM

    2. Run:

      crictl ps crictl logs <etcd-container-id> 
      crictl logs <kube-apiserver-container-id>
       You may observe repeated messages such as:
      {"level":"warn","ts":"2025-07-14T04:38:22.782Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"33da9c7e9d00e90b","rtt":"470.285µs","error":"dial tcp 10.188.14.201:2380: connect: no route to host"}
      dial tcp <control-plane-IP>:6443: connect: no route to host
      dial tcp <etcd-peer-IP>:2380: i/o timeout

Environment

VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid

Cause

The issue typically occurs when control plane nodes are recreated or reconfigured and assigned new IP addresses. However, etcd continues to reference the original static IPs configured in its manifest (usually under advertise-client-urls), leading to quorum failure.

As etcd cannot establish peer communication due to the outdated IPs, it fails to form a cluster. This, in turn, prevents kube-vip from assigning the virtual IP, rendering the API server unreachable.

Resolution

  • Gather network information:
    • SSH into each control plane node and run:

      ip addr cat /etc/kubernetes/manifests/etcd.yaml | grep advertise-client-urls
    • Note the MAC address of the eth0 interface and the IP configured under advertise-client-urls.

  • Validate IP address availability:

    • Ensure that the IPs referenced in the advertise-client-urls section are not reassigned elsewhere.

    • Use ping to verify the IPs are free:

      ping <IP fetched from advertise-client-url>
  • Free the old IPs:

    • If the original IPs have been reassigned (e.g., by DHCP to other hosts), release or unassign them in the network infrastructure (DHCP reservation, static binding, etc.).

  • Reboot the control plane nodes one by one:

    • Once the original IPs are confirmed to be available, Create a DHCP reservation for the original IP address(advertise-client-urls mentioned ip's) and MAC address gathered in step1

    • Reboot each control plane node individually.

    • Verify that each node picks up the expected IP via DHCP or static reservation.

    • Post-reboot, check crictl ps and crictl logs to confirm etcd and kube-apiserver pods are healthy.

  • Monitor cluster recovery:

    • Confirm that etcd has regained quorum.

    • Ensure kube-vip is assigned.

    • Verify that the API server is accessible.