vSAN -- Host cannot communicate with one or more other nodes in the vSAN enabled cluster - vSAN Host is network partitioned
search cancel

vSAN -- Host cannot communicate with one or more other nodes in the vSAN enabled cluster - vSAN Host is network partitioned

book

Article ID: 391275

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

One or more of the following Symptoms apply:

  • vSAN Health Service reports a vSAN network partition
  • Virtual machines running on the affected host might be restarted
  • Unable to ping the Host reporting a network partition
  • Host was removed and added back, but vSAN is no longer working on that Host: No working vSAN datastore on that Host
  • The output of esxcli vsan cluster get command shows the Host is partitioned from the Cluster:

[root@ESX1:~] esxcli vsan cluster get
Cluster Information
   Enabled: true
   Current Local Time: YYYY-MM-DDTHH:MM:SS
   Local Node UUID: ############-####-####-####-############
   Local Node Type: NORMAL
   Local Node State: MASTER
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: ############-####-####-####-############
   Sub-Cluster Backup UUID:
   Sub-Cluster UUID: ############-####-####-####-############
   Sub-Cluster Membership Entry Revision: #
   Sub-Cluster Member Count: 1
   Sub-Cluster Member UUIDs: ############-####-####-####-############
   Sub-Cluster Member HostNames: ESXV001.
   Sub-Cluster Membership UUID: ############-####-####-####-############
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: ############-####-####-####-############ ### YYYY-MM-DDTHH:MM:SS

 

  • The unicast tables on all Hosts show as complete, sample output below:

[root@ESX1:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address    Port  Iface Name  
------------------------------------  ---------  ----------------  -----------  -----  ----------
############-####-####-####-############        0              true     10.0.##.###  12321              
############-####-####-####-############        1              true     10.0.##.###  12321       

 

  • Examples of Alerts which might be observed: 

 

Environment

VMware vSAN (All Versions)

Cause

Network connectivity between nodes in the vSAN cluster could not be established due to issues with the physical network adapter handling vSAN traffic, which triggered vSphere HA to restart the VMs on another node in the vSAN cluster.

The error "sendto() failed (Host is down)" in a vSAN context usually indicates a network connectivity issue between vSAN hosts, often due to network partitioning or faulty physical NICs, preventing vSAN traffic from reaching other nodes. 

[root@ESX1:~] vmkping -I vmk1 10.0.##.###
PING 10.0.##.### (10.0.##.###): 56 data bytes
sendto() failed (Host is down)

 

Resolution

To try and restore network connectivity between the hosts down the active vmnic in use forcing traffic to the secondary vmnic via the below steps.

  1. Run esxtop and type n for networking to see which vmnic is actively being used for vSAN
  2. Run esxcli network nic down -n vmnic1 to down the vmnic from step1

  3. Test connectivity again vmkping -I vmk1 10.0.##.####
    PING 10.0.##.### (10.0.##.###): 56 data bytes
    64 bytes from 10.0.##.###: icmp_seq=0 ttl=64 time=0.127 ms
    64 bytes from 10.0.##.###: icmp_seq=1 ttl=64 time=0.165 ms
    64 bytes from 10.0.##.###: icmp_seq=2 ttl=64 time=0.164 ms

    --- 10.0.##.### ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.127/0.152/0.165 ms

 

Engage your Network Vendor/Team for further investigation of the physical network to resolve the networking issue on the primary path.