"The file table of the ramdisk 'tmp' is full. As a result, the file /tmp/Go.[file_name] could not be created by the application 'etcd'" ####-##-##T##:##:##.882Z In(182) vmkernel: cpu##:9#####8)Admission failure in path: host/vim/vmvisor/etcd:etcd.9#####7:uw.9#####7
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: E
rror while dialing dial tcp hostip:2379: connect: connection refused". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: E
rror while dialing dial tcp hostip:2379: connect: connection refused". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: operation was canceled". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: operation was canceled". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: operation was canceled". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: connect: connection refused". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: 2025-06-04T05:05:43.483Z WARN clientv3/retry_interceptor.go:62 retrying of unary invoker failed {"target": "endpoint://client-#####-####-###-####-########/hostfqdn:2379", "attempt": 0, "error": "rpc error: code = Unauthenticated desc = etcdserver: invalid auth token"}
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: connect: connection refused". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: 2025-06-04T05:05:43.490Z WARN clientv3/retry_interceptor.go:62 retrying of unary invoker failed {"target": "endpoint://client-#####-####-###-####-########/hostfqdn:2379", "attempt": 0, "error": "rpc error: code = InvalidArgument desc = etcdserver: authentication failed, invalid user ID or password"}
####-##-##T##:##:## No(5) clusterAgent[#####]: WARN grpc: addrConn.createTransport failed to connect to {hostfqdn:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp hostip:2379: connect: connection refused". Reconnecting...
####-##-##T##:##:## No(5) clusterAgent[#####]: 2025-06-04T05:05:43.492Z WARN clientv3/retry_interceptor.go:62 retrying of unary invoker failed {"target": "endpoint://client-#####-####-###-####-########/hostfqdn:2379", "attempt": 0, "error": "rpc error: code = InvalidArgument desc = etcdserver: authentication failed, invalid user ID or password"}
Er(3) etcd[######]: failed to find member ############## in cluster ##############
Er(3) etcd[######]: failed to find member ############## in cluster ##############
etcd[######]: peer e6eaf0202e2e2ba4 became inactive (message send to peer failed)
etcd[######]: failed to dial ############## on stream MsgApp v2 (peer ############## failed to find local node ##############)
etcd[######]: failed to dial ############## on stream Message (peer ############## failed to find local node ##############)
vSphere ESXi 8.x
When a DKVS (Distributed Key-Value Store) cluster is in an error state, it is known to cause a lot of DNS traffic, as the replica hosts are constantly retrying their connections to each other.
The behavior of this issue has improved in 8.0U3g compared to the previous versions, but it is not yet fixed. Significant enhancements are anticipated in the upcoming 9.x releases.
/usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status[root@ESXi:/] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status
{
"state": "hosted",If DKVS is enabled and running, below are 3 workaround options to resolve this issue.
/usr/lib/vmware-vpx/py/xmlcfg.py -f /etc/vmware-vpx/vpxd.cfg set vpxd/clusterStore/globalDisable truevmon-cli -r vpxd/etc/init.d/clusterAgent stopconfigstorecli files datafile delete -c esx -k cluster_agent_dataconfigstorecli files datadir delete -c esx -k cluster_agent_data
vmon-cli -r vpxdDisable DKVS on the vCenter where affected ESXi hosts are connected using the attached Python script.
. Python script is attached and run below command to run the python script in vCenter
python3 dkvs-cleanup.py -d disable -w all-soft -s restart
/usr/lib/vmware-vpx/py/xmlcfg.py -f /etc/vmware-vpx/vpxd.cfg get vpxd/clusterStore/globalDisable
Recommendations:
python3 dkvs-cleanup.py -d enable -w actions-soft -s restart