API server responds with "connect: no route to host" in TKGm Management or Workload clusters after ControlPlane node reboot

Products

Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

API server on TKG clusters will report "dial tcp <API_SERVER_VIP>:6443: connect: no route to host" errors on Management or Workload clusters
The cluster is built with multiple ControlPlane nodes
From an SSH to the ControlPlane node, users will see etcdctl commands fail, similiar to:

etcdctl member list

{"level":"warn","ts":"2024-06-01T10:17:39.213Z","logger":"etcd-client","caller":"v3@v3.5.6/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc056278940/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}

Error: context deadline exceeded

FATA[0005] execing command in container: command terminated with exit code 1
A review of the etcd logs in /var/log/pods/kube-system_etcd-<ETCD_POD_ID>/etcd/0.log will show errors like:

2024-06-01T10:19:01.639764812Z stderr F {"level":"info","ts":"2024-06-01T10:19:01.639Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"25b536d90d424cf9 [logterm: 16, index: 291114770] sent MsgPreVote request to c567a2de68348f39 at term 16"}

2024-06-01T10:19:01.655939437Z stderr F {"level":"warn","ts":"2024-06-01T10:19:01.655Z","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":5546622500406554979,"retry-timeout":"500ms"}

2024-06-01T10:19:01.671368969Z stderr F {"level":"warn","ts":"2024-06-01T10:19:01.671Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c567a2de68348f39","rtt":"7.364093ms","error":"dial tcp 192.168.0.2:2380: connect: no route to host"}

NOTE: The above messaging will be repeated for any peer nodes that were rebooted.
Comparing the current IP address for eth0 interface against the "advertise-client-urls" value in the etcd manifest, they do not match:

# ip addr

# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"

EXAMPLE:

# ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000

    link/ether 00:50:56:10:bb:23 brd ff:ff:ff:ff:ff:ff

    inet 192.168.0.100/24 brd 192.168.0.255 scope global dynamic eth0

# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"

    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.0.2:2379

Environment

TKGm

Cause

This problem is caused by the DHCP lease for eth0 expiring when there is no DHCP reservation for ControlPlane node IPs. If the node is restarted after DHCP lease expiry, a new IP will be assigned to the node. The etcd service is hard coded with the initial node IP used when the node is joined to the cluster. Changing this IP causes peer nodes to fail when attempting to communicate. This causes etcd quorum to fail, which prevents etcd functionality. As api-server is dependent on etcd for functionality, api-server will fail to respond until etcd is corrected.

Resolution

Gather the eth0 MAC address and the expected IP addresses for each ControlPlane node by connecting to the node via SSH and running the following commands:

# ip addr
# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"
Ensure the original IP addresses have not been reassigned by pinging the IP address gathered from the "advertise-client-urls" command
If the old IP addresses have been reassigned, they will need to be freed before the next step
Once the original IP addresses are free and confirmed to not be in use, create a DHCP reservation for the original eht0 IP address and MAC address gathered in step1

NOTE: Some DHCP servers require restart of services to apply reservation changes. Changes to DHCP are outside of VMware by Broadcom's ability to support, unless the DHCP is handled by NSX-T. If NSX-T is the DHCP provider, please open a ticket with the NSX-T team for assistance with DHCP reservations.
Reboot the ControlPlane nodes 1 by 1 and ensure the IP address is changed to reflect the updated DHCP reservation.

Additional Information

Reference the Configure Node DHCP Reservations documentation for further details