Witness host in a vSAN stretched cluster shows up as 'Not responding' in vCenter.
search cancel

Witness host in a vSAN stretched cluster shows up as 'Not responding' in vCenter.

book

Article ID: 390913

calendar_today

Updated On:

Products

VMware vSAN 7.x

Issue/Introduction

Symptoms:

  • Its a two node vSAN stretched cluster running on 7.0 U3 with its witness host in 'Not responding' state in vCenter.
  • The witness host is still part of the vSAN cluster though. All the objects are healthy.

  • Connecting the host fails after reaching 80% with error : 
    "Cannot synchronize host x.x.x.x"

    where x.x.x.x is the vmk0 of the witness host.

  • Restarted management services on Witness host but it fails to connect back to vCenter.

  • Power-cycling of Witness host does not aid in reconnecting it.

  • SSH and host client login works for Witness host.

  • vpxa.log reported following errors post-power-cycling:

    2025-03-12T03:09:55.356Z error vpxa[264943] [Originator@6876 sub=Heartbeat opID=SWI-41a7] Agent can't send heartbeats: Host is down
    2025-03-12T03:10:05.362Z error vpxa[264943] [Originator@6876 sub=Heartbeat opID=SWI-41a7] Agent can't send heartbeats: Host is down
    2025-03-12T03:10:11.914Z info vpxa[264972] [Originator@6876 sub=vpxaInvtHost] Increment master gen. no to (295869): Event:VpxaEventHostd::CheckQueuedEvents
    2025-03-12T03:10:15.367Z error vpxa[264943] [Originator@6876 sub=Heartbeat opID=SWI-41a7] Agent can't send heartbeats: Host is down

  • Removed the host from vCenter inventory but adding back shows up following error:

    "Cannot contact the specified host (x.x.x.x). The host may not be available on the network, a network configuration problem may exist, or the management services on this host may not be responding."

Environment

VMware vSAN 7.x

Resolution

  1. Deployed the new Witness host with the same image 7.0.3c used by faulty witness host.

  2. Placed the faulty witness host in maintenance mode with 'No Data Migration'. 

    (Note : Maintenance mode with 'Ensure accessibility' fails)

  3. Post deployment of new witness host, replaced the old Witness node on the vSAN cluster using 'Change Witness host' under vSAN Cluster > Configure > Fault Domain.

  4. Validated the health of the objects on the cluster using following command and found to be all healthy
    # esxcli vsan debug object health summary get

  5. Power-off the old witness host.
  6. Remove the Witness appliance from vCenter inventory.
    Right click on Witness appliance > 'Remove from Inventory'