etcd service on ESXi host fails after the host is renamed
book
Article ID: 385110
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
After an ESXi host is renamed, the etcd service will fail since it's not automatically reconfigured to use the new hostname.
You may see the following log entries:
/var/log/etcd.log
health check for peer <ETCD_MEMBER> could not connect: dial tcp: lookup <ESXi_HOST> on <ESXi_HOST_IP>:53: no such host
/var/log/vmkernel.log
cpu84:1001394011)VmkAccess: SocketInetConnect:149: etcd: running in etcdDom(49): ipAddr = <IPV6_ADDRESS>::, port = 9: Access denied by vmkernel access control policy
/var/log/clusterAgent.log
No(5) clusterAgent[525412]: WARN grpc: addrConn.createTransport failed to connect to <ESXi_HOST>:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup <ESXi_HOST>: no such host". Reconnecting...
2025-01-01T13:37:35Z No(5) clusterAgent[525412]: ERROR Failed to prepare supervisor state
The ESXi host's etcd configuration file (etcd.yml) will reference the host's previous name:
This issue can occur even if WCP/TKG isn't in use.
Environment
vSphere 8
Cause
When a ESXi host is renamed, its etcd configuration isn't updated automatically. This causes the ESXi hosts in the cluster that are etcd members to attempt to contact each other using their old host names. If there isn't a DNS record for these host names, communication will fail and generate messages in /var/log/etcd.log
Resolution
Each vCenter cluster will contain three ESXi hosts that are etcd cluster members. The other ESXi hosts in a cluster will not be etcd cluster members so they may not have the etcd.yml configuration file and the commands below do not need to be run on them.
ssh onto each ESXi host and find the etcd.yml configuration file:
find /vmfs/volumes -name etcd.yml -print | head -n 1
In vSphere 8 environment, even if you are not using any K8S-related service or app, the cluster will choose three hosts as etcd nodes and enable the etcdClientComm and etcdPeerComm services.
We can use the command: /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status
to check the etcd cluster status on the ESXi host.