, Etcd
service keeps crashing as soon as its starts.clusterAgent
logs of the the problematic host, we see the error "connection reset by peer"
No(5) clusterAgent[3931896]: INFO Etcd client started watch {"opID": "kvwatch-tlspeertrust", "cli": "0xc0001e81a0", "key": "root/tlspeertrust"}
No(5) clusterAgent[3931896]: INFO Etcd client started watch {"opID": "kvwatch-votingmembersupdated", "cli": "0xc0001e81a0", "key": "root/votingmembersupdated"}
No(5) clusterAgent[3931896]: WARN grpc: addrConn.createTransport failed to connect to {ESXi-FQDN:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp ESXi-FQDN-IP
:28383->ESXi-FQDN-IP
:2379: read: connection reset by peer". Reconnecting...
No(5) clusterAgent[3931896]: WARN grpc: addrConn.createTransport failed to connect to {ESXi-FQDN:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp ESXi-FQDN-IP
:36502->ESXi-FQDN-IP
:2379: read: connection reset by peer". Reconnecting...
Watchdog.log,
we see that the service keeps restarting.watchdog[XXXXXXX]: Started etcdmain with PID=3931911
watchdog[XXXXXXX]: Restarting etcdmain
watchdog[XXXXXXX]: Started etcdmain with PID=3931929
watchdog[XXXXXXX]: Restarting etcdmain
watchdog[XXXXXXX]: Started etcdmain with PID=3931943
watchdog[XXXXXXX]: Restarting etcdmain
watchdog[XXXXXXX]: Started etcdmain with PID=3931969
watchdog[XXXXXXX]: Restarting etcdmain
watchdog[XXXXXXX]: Started etcdmain with PID=3931985
watchdog[XXXXXXX]: Restarting etcdmain
watchdog[XXXXXXX]: Started etcdmain with PID=3931999
etcd
logIn(6) etcd[XXXXXXX]: added member 9e8cfbf3dbf0e555 [https://ESXi-FQDN:2380] to cluster 2fbf9a482d65ed67
In(6) etcd[XXXXXXX]: starting peer 9e8cfbf3dbf0e555...
In(6) etcd[XXXXXXX]: started HTTP pipelining with peer 9e8cfbf3dbf0e555
In(6) etcd[XXXXXXX]: started streaming with peer 9e8cfbf3dbf0e555 (writer)
In(6) etcd[XXXXXXX]: removed member 9e8cfbf3dbf0e555 from cluster 2fbf9a482d65ed67
In(6) etcd[XXXXXXX]: stopping peer 9e8cfbf3dbf0e555...
In(6) etcd[XXXXXXX]: stopped streaming with peer 9e8cfbf3dbf0e555 (writer)
In(6) etcd[XXXXXXX]: stopped streaming with peer 9e8cfbf3dbf0e555 (writer)
In(6) etcd[XXXXXXX]: started streaming with peer 9e8cfbf3dbf0e555 (stream MsgApp v2 reader)
In(6) etcd[XXXXXXX]: stopped HTTP pipelining with peer 9e8cfbf3dbf0e555
In(6) etcd[XXXXXXX]: stopped streaming with peer 9e8cfbf3dbf0e555 (stream MsgApp v2 reader)
In(6) etcd[XXXXXXX]: started streaming with peer 9e8cfbf3dbf0e555 (stream Message reader)
In(6) etcd[XXXXXXX]: stopped streaming with peer 9e8cfbf3dbf0e555 (stream Message reader)
In(6) etcd[XXXXXXX]: stopped peer 9e8cfbf3dbf0e555
In(6) etcd[XXXXXXX]: removed peer 9e8cfbf3dbf0e555
Admission failure in path: host/vim/vmvisor/etcd:etcd
" in VMkernal.log
/var/run/log/vmkernel.log
vmkernel: cpu12:3932082)Admission failure in path: host/vim/vmvisor/etcd:etcd.3932076:uw.3932076
vmkernel: cpu12:3932082)UserWorld 'etcd' 3932076 with cmdline '/usr/lib/vmware/etcd/bin/etcd --config-file=/var/cache/datafiles/esx#cluster_agent_data/etcd.yml', parent 2097917
vmkernel: cpu12:3932082)started from 'init' 2097917 with cmdline '/bin/init', parent 0
vmkernel: cpu12:3932082)uw.3932076 (10380427) requires 4096 KB, asked 4096 KB from etcd (6977) which has 193788 KB occupied and 2820 KB available.
vmkernel: cpu84:3932095)Admission failure in path: host/vim/vmvisor/etcd:etcd.3932093:uw.3932093
vmkernel: cpu84:3932095)UserWorld 'etcd' 3932093 with cmdline '/usr/lib/vmware/etcd/bin/etcd --config-file=/var/cache/datafiles/esx#cluster_agent_data/etcd.yml', parent 2097917
vmkernel: cpu84:3932095)started from 'init' 2097917 with cmdline '/bin/init', parent 0
vmkernel: cpu84:3932095)uw.3932093 (10380454) requires 4096 KB, asked 4096 KB from etcd (6977) which has 192872 KB occupied and 3736 KB available
/usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status
"state": "hosted"
"cluster_id": "ebbbcf4f-8eae-4fe8-85e8-d197a4ffe1c7: domain-c952432",
"is_in_alarm": false,
"alarm_cause": "",
"is_in_cluster": true,
"members": {
"available": true
},
"namespaces": [
{
"name": "root",
"up_to_date": true,
"members": [
"peer_address": "ESXi1:2380",
"api_address": "ESXi1:2379",
"reachable": true,
"primary": "yes",
"learner": false
},
{
"peer_address": "ESXi2:2380",
"api_address":
"ESXi2:2379",
"reachable": true,
"primary": "no",
"learner": false
},
{
"peer_address": "ESXi3:2388",
"api_address": "ESXi3:2379",
"reachable": false,
"primary": "unknown",
"learner": false
}
Etcd service runs out of memory and keeps crashing
This issue is addressed in vSphere 8.0 U3e.
Update VMware vCenter and VMware vSphere ESXi to 8.0 U3e to resolve this issue.