Error "vSphere HA detected a possible host failure of this host" observed after ESXi or NIC firmware upgrade
search cancel

Error "vSphere HA detected a possible host failure of this host" observed after ESXi or NIC firmware upgrade

book

Article ID: 390275

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • It's a vSAN cluster.

  • One of the hosts upgraded through a cluster image using VMware Lifecycle Manager (VLCM).

  • Post remediation, the host reports HA-related errors as seen in the Summary tab:
    "vSphere HA host status"
    "vSphere HA detected a possible host failure of this host."



  • Additionally, the following errors are observed at the cluster level:
    "vSAN network alarm 'vSAN MTU check (ping with large packet size)'"
    "vSAN network alarm 'vSAN Basic (unicast) connectivity check'"

  • Reverting the host to an older ESXi version does not fix these errors.


  • The issue is also seen post-NIC firmware upgrade, rendering one of the active VMNICs down, causing vSAN cluster partitioning. This causes HA-related errors besides cluster partitioning.


  • vmkernel.log reports the following errors for the unicast network where data nodes fail to connect to each other over port 12321 on the vSAN network:

    2025-06-03T06:43:37.903Z cpu76:2099736) CMMDSNet: CMMDSNetSendtoUnicastChannels: 1486: Throttled: 52d68c54-dddl-9a5c-0b6f-##########: Failed to send to unicast host '#.#.#.#:12321' on iface '#.#.#.#': Host is down.
    2025-06-03T06:43:43.903Z cpu76:2099736) CMMDSNet: CMMDSNetSendtoUnicastChannels: 1486: Throttled: 52d68c54-dddl-9a5c-0b6f-##########: Failed to send to unicast host '#.#.#.#;12321' on iface '#.#.#.#': Host is down.

Environment

Vmware vSAN 7.x

Vmware vSAN 8.x

VMware vSAN 9.x

Cause

  • These errors are reported due to network partitioning between vSAN nodes.

  • The upgraded host fails to reach the other vSAN data node through the vSAN network.

  • Running the vmkping command on the upgraded host fails with the 'Host is down' error.

    # vmkping -I vmkY x.x.x.x
    PING #.#.#.#(#.#.#.#): 56 data bytes
    sendto() failed (Host is down)
    • where is the VMkernel adapter of the upgraded host on which vSAN traffic is enabled?
    • where—is the IP of the vmkernel adapter of the other data node on which vSAN traffic is enabled

Resolution

  • Of the two physical NICs used for the vSAN-enabled vmkernel adapter, one is found to be faulty.

  • Placing the faulty physical NIC in the 'Unused' state using the following steps helps in re-establishing the network connectivity between the two vSAN data nodes.

    • In vSphere Client > Host gbmikvsan01 > Configure > Virtual Switches > Expand Switch vSwitch1 >  Click on 'Manage Physical Adapters.' Place vmnicX under the 'Unused Adapter' list. 

      (Where vmnicX is the suspected physical adapter).
  • The second active physical NIC takes over once the faulty NIC is placed in the 'unused state.'

  • Validate ping from each of the data nodes over the vSAN vmkernel network to the other node using the command:

    # vmkping -I vmkY #.#.#.#

where vmkY is the vmkernel adapter of the host on which vSAN traffic is enabled.
where #.#.#.# is the IP of the vmkernel adapter of the other data node on which vSAN traffic is enabled.