vSAN -- Host cannot communicate with one or more other nodes in the vSAN enabled cluster

search cancel

vSAN -- Host cannot communicate with one or more other nodes in the vSAN enabled cluster - vSAN Host is network partitioned

book

Article ID: 391275

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

One or more of the following Symptoms apply:

vSAN Health Service reports a vSAN network partition
Virtual machines running on the affected host might be restarted
Unable to ping the Host reporting a network partition
Host was removed and added back, but vSAN is no longer working on that Host: No working vSAN datastore on that Host
The output of esxcli vsan cluster get command shows the Host is partitioned from the Cluster:

[root@ESX1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: YYYY-MM-DDTHH:MM:SS
Local Node UUID: ############-####-####-####-############
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: ############-####-####-####-############
Sub-Cluster Backup UUID:
Sub-Cluster UUID: ############-####-####-####-############
Sub-Cluster Membership Entry Revision: #
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: ############-####-####-####-############
Sub-Cluster Member HostNames: ESXV001.
Sub-Cluster Membership UUID: ############-####-####-####-############
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: ############-####-####-####-############ ### YYYY-MM-DDTHH:MM:SS

The unicast tables on all Hosts show as complete, sample output below:

[root@ESX1:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
############-####-####-####-############ 0 true 10.0.##.### 12321
############-####-####-####-############ 1 true 10.0.##.### 12321

Examples of Alerts which might be observed:

Environment

VMware vSAN (All Versions)

Cause

Network connectivity between nodes in the vSAN cluster could not be established due to issues with the physical network adapter handling vSAN traffic, which triggered vSphere HA to restart the VMs on another node in the vSAN cluster.

The error "sendto() failed (Host is down)" in a vSAN context usually indicates a network connectivity issue between vSAN hosts, often due to network partitioning or faulty physical NICs, preventing vSAN traffic from reaching other nodes.

[root@ESX1:~] vmkping -I vmk1 10.0.##.###
PING 10.0.##.### (10.0.##.###): 56 data bytes
sendto() failed (Host is down)

Resolution

To try and restore network connectivity between the hosts down the active vmnic in use forcing traffic to the secondary vmnic via the below steps.

Run esxtop and type n for networking to see which vmnic is actively being used for vSAN
Run esxcli network nic down -n vmnic1 to down the vmnic from step1
Test connectivity again vmkping -I vmk1 10.0.##.####
PING 10.0.##.### (10.0.##.###): 56 data bytes
64 bytes from 10.0.##.###: icmp_seq=0 ttl=64 time=0.127 ms
64 bytes from 10.0.##.###: icmp_seq=1 ttl=64 time=0.165 ms
64 bytes from 10.0.##.###: icmp_seq=2 ttl=64 time=0.164 ms

--- 10.0.##.### ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.127/0.152/0.165 ms

Engage your Network Vendor/Team for further investigation of the physical network to resolve the networking issue on the primary path.

Feedback

thumb_up Yes

thumb_down No