vSAN 8.0.x Stretched Cluster: Network Partition Error with Missing Witness Node
search cancel

vSAN 8.0.x Stretched Cluster: Network Partition Error with Missing Witness Node

book

Article ID: 430705

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

A VMware vSAN 8.0.x Original Storage Architecture (OSA) or Express Storage Architecture (ESA) stretched cluster may report a persistent Network Partition error. In the vSAN Skyline Health UI, the witness node appears missing from the sub-cluster membership, often resulting in objects entering a reduced-availability-with-no-rebuild state. This condition can persist even when basic ICMP ping connectivity is established, typically due to network-level blocking of essential transport ports or an MTU mismatch along the physical routing path.

Environment

  • VMware vSAN 8.0.x (OSA and ESA)
  • 2-node or Stretched Cluster configuration

Cause

The partition is generally caused by the inability of the Unicast Agent to maintain membership due to:

  1. MTU Mismatch: The physical network path fails to pass 1500-byte or 9000-byte non-fragmented frames.
  2. Firewall Restrictions: Security rules are blocking specific vSAN clustering and transport ports between the data site and the witness site.

Resolution

Restore bidirectional communication across all required vSAN ports and ensure MTU consistency.

1. Validate MTU Path Consistency Confirm the network path can pass full-size, non-fragmented frames between the Master data node and the Witness. Run the following command from an ESXi host:

# For MTU 1500:
vmkping -I vmk#### -d -s 1472 [Witness_IP]

# For MTU 9000:
vmkping -I vmk#### -d -s 8972 [Witness_IP]

Note: If this fails, investigate physical switch configurations or routing interface bottlenecks.

2. Open Required Firewall Ports Ensure the following ports are open bidirectionally between all vSAN data nodes and the witness node:

  • UDP 12321: vSAN Unicast Agent (Heartbeats)
  • UDP/TCP 12345: vSAN Clustering Service (CMMDS)
  • UDP/TCP 23451: Reliable Datagram Transport (RDT)
  • TCP 2233: vSAN Transport Service

3. Verify Port Accessibility Once ports are opened, verify reachability from the ESXi hosts using the nc (netcat) command:

nc -u -z <witness-ip> 12321
nc -u -z <witness-ip> 12345
nc -u -z <witness-ip> 23451

4. Verify Port Accessibility using the pktcap-uw command between data and witness nodes and witness and data nodes .

pktcap-uw --vmk vmk --dir 2 -o - | tcpdump-uw -enr - | grep -i 12321

5. Monitor Recovery Trigger a manual Skyline Health check. The witness node should automatically rejoin the sub-cluster, and the partition alert should clear.

Additional Information

Related Information