vSAN Node network partitioned
search cancel

vSAN Node network partitioned

book

Article ID: 388099

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • vSAN node upgraded or updated with latest patch.
  • Mixed ESXi version on the vSAN cluster or lower patch version.
  • Objects reports to be in reduced-availability state.
  • Network health on vSAN cluster are red or critical.
  • vmkping between the nodes may be successful.

Validation:

1. Network partitioned host reports sub cluster member count as 1.

[root@######~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2025-02-07T00:13:45Z
Local Node UUID: ########-####-####-####-########
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: ########-####-####-####-########
Sub-Cluster Backup UUID:
Sub-Cluster UUID: ########-####-####-####-########
Sub-Cluster Membership Entry Revision: 2
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: ########-####-####-####-########
Sub-Cluster Member HostNames: Hostname###
Sub-Cluster Membership UUID: ########-####-####-####-########
Unicast Mode Enabled: true
Maintenance Mode State: OFF

 
2. Objects reports to be under reduced availability.
 
[root@######~] esxcli vsan debug object health summary get
Health Status                                              Number Of Objects
---------------------------------------------------------  -----------------
remoteAccessible                                                           0
inaccessible                                                               0
reduced-availability-with-no-rebuild                                     583
reduced-availability-with-no-rebuild-delay-timer                           0

 

3. Network health reports to be in red or critical.
 

[root@######~] esxcli vsan health cluster list

Health Test Name                                    Status

--------------------------------------------------  ------

Overall health findings                             red (Network misconfiguration)

Network                                             red

vSAN cluster partition                              red

Cluster                                             yellow

 
 
4. vmkping is successful.
 
[root@######:~] vmkping -I vmk1 **.**.**.38 -s 1472
PING **.**.**.38 (1**.**.**.38): 1472 data bytes
1480 bytes from **.**.**.38: icmp_seq=0 ttl=64 time=0.507 ms
1480 bytes from **.**.**.38: icmp_seq=1 ttl=64 time=0.371 ms
1480 bytes from **.**.**.38: icmp_seq=2 ttl=64 time=0.561 ms
 
--- **.**.**.38 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.371/0.480/0.561 ms

Environment

VMware vSAN 7.0.x
VMware vSAN 8.0.x

Cause

vSAN node gets network partitioned due to incompatible firmware version. VMNIC used for vSAN traffic are not updated with latest firmware version.
 

To check the status of VMNIC used for vSAN traffic with current driver and firmware version.

esxcli network nic stats get –n <vmnic interface>
 
Driver: bnxtnet
Firmware Version: 226.0.145.0 /pkg 226.1.107000
Version: 231.0.153.0
 

Resolution

1.Validate the current device driver version and this should be compatible to the firmware which needs to be upgraded.

2.Upgrade the VMNIC firmware to latest version which should be compatible as per hardware vendor and the same should be validated in. Broadcom compatibility matrix  Broadcom Compatibility Guide .

As per Broadcom compatibility guide you may see that for the current device driver version (bnxtnet) 231.0.153.0, the Supported firmware is 231.1.162001

Additional Information

1.Other hosts may see all the cluster members until the impacted host is placed in maintenance mode.

2.Rebooting the host or placing the host and exiting from maintenance may not help.

3.In the environment where we have standby vmnic configured, Network checks are always recommended by placing the active vmnic down and validate if the standby vmnic takes over to check the failover settings. ESXTOP can be used with option n to check the network traffic stats. This is done to isolate the network related issues.

4.Once you start an upgrade of a vSAN cluster make sure to complete the upgrade ASAP preferably within a week's time as mixed versions of ESXi in the same cluster, especially a difference of major releases, is not a supported configuration and can cause issues such as performance issues and cluster instability. This is due to having mixed codes talking to each other within the same cluster. Mixed versions are ONLY supported during an upgrade which is expected to be completed typically within a 24-48hr period for clusters below 32 hosts. For large clusters, 32-64 hosts typical upgrade should be completed within 48-72hrs.

5.You may proceed with ESXi upgrade on other nodes as the partition issues are expected behavior in 3 node vSAN cluster but it is always better to have the device driver and firmware versions compatible before upgrade to isolate the partition issue if further investigations are required.

 

  • Command to check the vendor ID details which can be used in Broadcom link  Broadcom Compatibility Guide to get accurate device driver and firmware version details by choosing the current ESXi version.

vmkchdev -l | grep -i vmnic