dial tcp <API_SERVER_VIP>:6443: connect: no route to host
" errors on Management or Workload clustersetcdctl member list
{"level":"warn","ts":"2024-06-01T10:17:39.213Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc056278940/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
FATA[0005] execing command in container: command terminated with exit code 1
/var/log/pods/kube-system_etcd-<ETCD_POD_ID>/etcd/0.log
will show errors like:2024-06-01T10:19:01.639764812Z stderr F {"level":"info","ts":"2024-06-01T10:19:01.639Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"25b536d90d424cf9 [logterm: 16, index: 291114770] sent MsgPreVote request to c567a2de68348f39 at term 16"}
2024-06-01T10:19:01.655939437Z stderr F {"level":"warn","ts":"2024-06-01T10:19:01.655Z","caller":"etcdserver/v3_server.go:840","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":5546622500406554979,"retry-timeout":"500ms"}
2024-06-01T10:19:01.671368969Z stderr F {"level":"warn","ts":"2024-06-01T10:19:01.671Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c567a2de68348f39","rtt":"7.364093ms","error":"dial tcp 192.168.0.2:2380: connect: no route to host"}
NOTE: The above messaging will be repeated for any peer nodes that were rebooted.
advertise-client-urls
" value in the etcd manifest, they do not match:# ip addr
# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000
link/ether 00:50:56:10:bb:23 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.100/24 brd 192.168.0.255 scope global dynamic eth0
# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.0.2:2379
TKGm
This problem is caused by the DHCP lease for eth0 expiring when there is no DHCP reservation for ControlPlane node IPs. If the node is restarted after DHCP lease expiry, a new IP will be assigned to the node. The etcd service is hard coded with the initial node IP used when the node is joined to the cluster. Changing this IP causes peer nodes to fail when attempting to communicate. This causes etcd quorum to fail, which prevents etcd functionality. As api-server is dependent on etcd for functionality, api-server will fail to respond until etcd is corrected.
# ip addr
# cat /etc/kubernetes/manifests/etcd.yaml | grep "advertise-client-urls"
"advertise-client-urls"
commandReference the Configure Node DHCP Reservations documentation for further details