Error "vSphere HA detected a possible host failure of this host" observed after ESXi or NIC firmware upgrade

search cancel

Error "vSphere HA detected a possible host failure of this host" observed after ESXi or NIC firmware upgrade

book

Article ID: 390275

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

It's a vSAN cluster.
One of the hosts upgraded through a cluster image using VMware Lifecycle Manager (VLCM).
Post remediation, the host reports HA-related errors as seen in the Summary tab:
"vSphere HA host status"
"vSphere HA detected a possible host failure of this host."
Additionally, the following errors are observed at the cluster level:
"vSAN network alarm 'vSAN MTU check (ping with large packet size)'"
"vSAN network alarm 'vSAN Basic (unicast) connectivity check'"
Reverting the host to an older ESXi version does not fix these errors.
The issue is also seen post-NIC firmware upgrade, rendering one of the active VMNICs down, causing vSAN cluster partitioning. This causes HA-related errors besides cluster partitioning.
vmkernel.log reports the following errors for the unicast network where data nodes fail to connect to each other over port 12321 on the vSAN network:

2025-06-03T06:43:37.903Z cpu76:2099736) CMMDSNet: CMMDSNetSendtoUnicastChannels: 1486: Throttled: 52d68c54-dddl-9a5c-0b6f-##########: Failed to send to unicast host '#.#.#.#:12321' on iface '#.#.#.#': Host is down.
2025-06-03T06:43:43.903Z cpu76:2099736) CMMDSNet: CMMDSNetSendtoUnicastChannels: 1486: Throttled: 52d68c54-dddl-9a5c-0b6f-##########: Failed to send to unicast host '#.#.#.#;12321' on iface '#.#.#.#': Host is down.

Environment

Vmware vSAN 7.x

Vmware vSAN 8.x

VMware vSAN 9.x

Cause

These errors are reported due to network partitioning between vSAN nodes.
The upgraded host fails to reach the other vSAN data node through the vSAN network.
Running the vmkping command on the upgraded host fails with the 'Host is down' error.

# vmkping -I vmkY x.x.x.x
PING #.#.#.#(#.#.#.#): 56 data bytes
sendto() failed (Host is down)
- where is the VMkernel adapter of the upgraded host on which vSAN traffic is enabled?
- where—is the IP of the vmkernel adapter of the other data node on which vSAN traffic is enabled

Resolution

Of the two physical NICs used for the vSAN-enabled vmkernel adapter, one is found to be faulty.
Placing the faulty physical NIC in the 'Unused' state using the following steps helps in re-establishing the network connectivity between the two vSAN data nodes.
- In vSphere Client > Host gbmikvsan01 > Configure > Virtual Switches > Expand Switch vSwitch1 > Click on 'Manage Physical Adapters.' Place vmnicX under the 'Unused Adapter' list.
  
  (Where vmnicX is the suspected physical adapter).

The second active physical NIC takes over once the faulty NIC is placed in the 'unused state.'
Validate ping from each of the data nodes over the vSAN vmkernel network to the other node using the command:

# vmkping -I vmkY #.#.#.#

where vmkY is the vmkernel adapter of the host on which vSAN traffic is enabled.
where #.#.#.# is the IP of the vmkernel adapter of the other data node on which vSAN traffic is enabled.

Feedback

thumb_up Yes

thumb_down No