vSAN Health Service - Network Health - vSAN Cluster Partition
search cancel

vSAN Health Service - Network Health - vSAN Cluster Partition

book

Article ID: 318839

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
This article explains the Network Health – vSAN Cluster Partition check in the vSAN Health Service and provides details on why it might report an error.

Environment

VMware vSAN

Resolution

Q: What does the Network Health – VSAN Cluster Partition check do?

In order to function properly, all vSAN hosts should be able to communicate properly with each other via the vSAN Network
 
If all ESXi hosts in the cluster cannot communicate, a vSAN cluster will split into multiple network partitions. (For example sub-groups of ESXi hosts that can talk to each other, but not to other sub-groups).
 
When this occurs, vSAN objects may become unavailable until the network misconfiguration is resolved. For smooth operations of production vSAN clusters, it is very important to have a stable network with no extra network partitions (For example: Only one partition).
 
This health check examines the cluster to see how many partitions exist. It displays an error if there is more than a single partition in the vSAN cluster. Note that this check really determines if there is a network issue, but does not attempt to find a root cause. Other network health checks are required to find the root cause.

Q: What does it mean when it is in an error state?

This health check is said to be OK when only a single partition is found. As soon as multiple partitions are discovered, the cluster is considered unhealthy.
 
There are likely to be other warnings displayed in the vSphere Web Client when a multiple partition issue occurs. For example, the network configuration status in the vSAN General view is likely to state network misconfiguration detected.
 
Another interesting view is the vSAN Disk Management. This contains a column that provides information on the network partition group to which the ESXi host belongs. To see how many partitions the cluster has been split into, examine this column. If each ESXi host is in its own network partition group, then there is a cluster-wide issue. If only one ESXi host is in its own network partition group and all other ESXi hosts are in a different network partition group, then only that ESXi host has the issue. This may help to isolate the issue at hand and focus on the investigation effort.

Note: The health User Interface displays the same information in the details section of this check.
 

Q: How does one troubleshoot and fix the error state?

The network configuration issue needs to be located and resolved. Additional health service checks on the network are designed to assist you on finding the root cause of what may be causing the network partition. The reasons can range from mis-configured subnets (all ESXi hosts must have matching subnets), mis-configured vSAN traffic VMkernel adapters (all ESXi hosts must have a VSAN vmknic configured), mis-configured VLANs or general network communication issues, to specific multicast issues (all ESXi hosts have matching multicast settings). The additional network health checks are designed to isolate which of those issues may be the root cause, and should be viewed in parallel with this health check. If the current environment setup is a stretched cluster, refer to the vSAN Stretched Cluster Configuration Guide to see if any additional static routes are required.
 
Aside from mis-configurations, it is also possible to have partitions when the network is overloaded, leading to substantial dropped packets. vSAN can tolerate a small amount of dropped packets but once there is above a medium amount of dropped packets, performance issues may ensue.
 
If none of the misconfiguration checks indicate an issue, it is advisable to watch for dropped packet counters, as well as perform a pro-active network performance test. Proactive network performance tests, which may be initiated from RVC, are discussed in the vSAN Health Services Guide.
 
To examine the dropped packet counters on an ESXi host, use the esxtop network view (press n) and examine the field %DRPRX for excessive dropped packets. You may also need to watch the switch and switch ports, as they may also drop packets. Another metric that should be checked for, is an excessive amount of pause frames that can slow down the network and impact performance.


Additional Information

For more information on collecting VMware vSAN Logs, see Collecting vSAN support logs and uploading to VMware (2072796).

Also, see:



vSAN Health Service - Cluster Health - Advanced vSAN configuration in sync
vSAN Health Service - Network Health - Hosts disconnected from vCenter Server
vSAN Health Service - Network Health - Unexpected vSAN cluster members
vSAN Health Service - Network Health – Hosts with vSAN disabled
vSAN Health Service - Network Health - All hosts have a vSAN vmknic configured
vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check)
vSAN Health Service - Network Health - Hosts with connectivity issues
vSAN Health Service - Data Health – vSAN Object Health
vSAN Health Service - Physical Disk Health - Overall Disk Health
vSAN Health Service - Limits Health – Current Cluster Situation
vSAN Health Service - Limits Health – After one additional host failure
vSAN Health Service - Physical Disk Health - Disk Capacity
vSAN Health Service – Physical Disk Health – Software State Health
vSAN Health Service – Physical Disk Health – Component Metadata Health
vSAN Health Service - Physical Disk Health – Congestion
vSAN Health Service - Physical Disk Health – Memory pools
vSAN Health Service - vSAN HCL Health - Controller Release Support
vSAN Health Service – vSAN HCL Health – Controller Driver
vSAN Health Service - vSAN HCL Health – vSAN HCL DB up-to-date
vSAN Health Service - vSAN HCL Health – SCSI Controller on vSAN HCL
vSAN Health Service - Cluster Health – CLOMD liveness check
vSAN Health Check Information