ESXi host(s) intermittently change state to NO_RESPONSE in vCenter Server due to missed heartbeats. Log analysis reveals a specific cryptographic failure during the SSL handshake process.
ESXi hosts repeatedly disconnect and reconnect.
vpxd.log shows info messages like<timedatestamp> info vpxd[#####] [Originator@#### sub=InvtHostCnx opID=HeartbeatStartHandler-########] Missed 11 heartbeats for host [vim. HostSystem:host-######, ###.###.###]vpxd.log shows warning messages like:<timedatestamp> warning rhttpproxy [#######] [Originator@#### sub=IO. Connection] Failed to read buffer from stream; SSL(<io_obj p:0x000000##########, h:21, <TCP '##.##.##.## : 443'>, <TCP ##.##.##.## : 40116'>>) e: 104(Connection reset by peer), async: true, duration: 1059msec<timedatestamp> warning rhttpproxy [#######] [Originator@#### sub=Proxy Req #####] Error reading from client while waiting for header: N7Vmacore15SystemExceptionE (Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.)
<timedatestamp> error vpxd[#####] [Originator@#### sub=IO.Http opID=TaskLoop-host-######] User agent failed to send request; SSL(<io_obj p:0x0000########, h:-1, <TCP '##.#.#.## : 50150'>, <TCP '##.##.#.## : 443'>>), N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac)VMware Cloud Foundation (VCF)
vCenter Server 7.x / 8.x
ESXi 7.x / 8.x
Network Packet Corruption is occurring in the transit path between vCenter and the ESXi hosts.
The bad record mac error specifically indicates that the SSL/TLS payload was altered or truncated after the sender calculated the Message Authentication Code (MAC), causing the receiver to reject the packet due to integrity failure.
To investigate the virtual path, you must determine if the packets are being altered between when they enter the host at the vmnic, and when the packets are delivered to the vCenter guest VM.
Refer to KB 341568 Packet capture on ESXi using the pktcap-uw tool for instructions on how to perform packet captures:
net-stats -l | grep <name of the vCenter server><name of the vCenter server> is the name of the vCenter server in questionesxtop n] keypktcap-uw --uplink vmnic# --capture UplinkRcvKernel --ip <IP Address of the vCenter Server> -o /vmfs/volumes/<Datastore Name>/<Folder Name>/<Host Name>.vmnic#.UplinkRcvKernel.<IP Address of the vCenter Server>.pcapngvmnic# is the vmnic identified in the esxtop display <IP Address of the vCenter Server> is the IP address of the vCenter Server in question<Datastore Name> is the name of the data store where this pcap will be saved<Folder Name> is the name of the folder where this pcap will be saved (example: case_###### where ###### is the Broadcom case number)<Host Name> is the name of the ESXi host where the vCenter Server vm in question resides<Host Name>.vmnic#.UplinkRcvKernel.<IP Address of the vCenter Server>.pcapng will be the file name of the resulting pcappktcap-uw --switchport <Switchport Number> --capture VnicRx -o /vmfs/volumes/<Datastore Name>/<Folder Name>/<Host Name>.switchport<Switchport Number>.VnicRx.pcapng<Switchport Number> is from net-stats -l output<Datastore Name> is the name of the data store where this pcap will be saved<Folder Name> is the name of the folder where this pcap will be saved (example: case_###### where ###### is the Broadcom case number)<Host Name> is the name of the ESXi host where the vCenter Server vm in question resides<Host Name>.switchport<Switchport Number>.VnicRx.pcapng will be the file name of the resulting pcap
If the packets seen in the --uplink capture (where the packets enter the host at the vmnic) are also seen in the --switchport capture (where the packets are delivered to the vCenter guest VM) and the contents of each packet match each other, then you can rule out the ESXi networking stack (the virtual part of the transit path) as a possible cause of the symptoms.
If the packets do not match as described above, then you should log a case with Broadcom Support to further investigate the virtual part of the transit path as per KB Creating and managing Broadcom cases and attach the pcap files capture above as well as a support bundle for the ESXi host where the vCenter resides.