vSAN Witness Partitioned from Stretched Cluster Due to Firewall Blocking UDP 12321
search cancel

vSAN Witness Partitioned from Stretched Cluster Due to Firewall Blocking UDP 12321

book

Article ID: 415689

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In a vSAN Stretched Cluster configuration (Management and Workload clusters), the witness appliance was observed as partitioned in Skyline Health. This issue can cause Reduced Availability with No Rebuild status across multiple vSAN objects.

Symptoms

  • Skyline Health reports:
    • vSAN cluster partition – Witness appliance partitioned
    • vSAN objects in Reduced Availability with No Rebuild
  • High latency observed when performing vmkping from data nodes to the witness appliance.
  • Cluster partition noted on stretched clusters.

Validation

Connectivity Tests:

  • Data node ↔ Data node: Successful, low latency

[root@esxihost:~] vmkping -I vmk3 10.##.##.3
PING 10.##.##.3 (10.##.##.3): 56 data bytes
64 bytes from 10.##.##.3: icmp_seq=0 ttl=64 time=0.159 ms
64 bytes from 10.##.##.3: icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from 10.##.##.3: icmp_seq=2 ttl=64 time=0.119 ms

--- 10.##.##.3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.119/0.136/0.159 ms

  • Data node ↔ Witness: Successful ICMP response but with high RTT

[root@esxihost:~] vmkping -I vmk5 10.##.##.11 -s 1472
PING 10.##.##.11 ( 10.##.##.11): 1472 data bytes
1480 bytes from  10.##.##.11: icmp_seq=0 ttl=53 time=18.261 ms
1480 bytes from  10.##.##.11: icmp_seq=1 ttl=53 time=17.592 ms
1480 bytes from  10.##.##.11: icmp_seq=2 ttl=53 time=17.826 ms

---  10.##.##.11 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 17.592/17.893/18.261 ms

Packet Capture Results

  • Consistent UDP traffic attempts were observed from data nodes to the witness appliance on port 12321, but no return traffic was received. This indicates one-way communication between the nodes.

Verification Commands:

tcpdump-uw -i <vmk-interface> | grep <witness-IP>

11:04:05.501279 IP <Datanode IP>.12321 > <WitnessFQDN>.12321: UDP, length 440
11:04:06.501250 IP <Datanode IP>.12321 > <WitnessFQDN>.12321: UDP, length 440
11:04:07.501295 IP <Datanode IP>.12321 > <WitnessFQDN>.12321: UDP, length 440
11:04:08.501281 IP <Datanode IP>.12321 > <WitnessFQDN>.12321: UDP, length 440

pktcap-uw --vmk <vmk-interface> --dir 2 -o - | tcpdump-uw -ner - | grep <witness-IP>

The name of the vmk is <vmk-interface>.
pktcap: The output file is -.
pktcap: No server port specifed, select 16##4 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 16##4.
pktcap: Main thread: 895#####68.
pktcap: Dump Thread: 895####76.
pktcap: The output file format is pcapng.
pktcap: Recv Thread: 895#####60.
pktcap: Accept...
reading from file -pktcap: Vsock connection from port 1##7 cid 2.
, link-type EN10MB (Ethernet), snapshot length 65##5
11:30:45.513987 00:##:##:##:##:e7 > 00:##:##:##:##:ff, ethertype IPv4 (0x0800), length 482: <Datanode IP>.12321 > <WitnessIP>.12321: UDP, length 440
11:30:46.514027 00:##:##:##:##:e7 > 00:##:##:##:##:ff, ethertype IPv4 (0x0800), length 482: <Datanode IP>.12321 > <WitnessIP>.12321: UDP, length 440
11:30:47.514035 00:##:##:##:##:e7 > 00:##:##:##:##:ff, ethertype IPv4 (0x0800), length 482: <Datanode IP>.12321 > <WitnessIP>.12321: UDP, length 440
11:30:48.514080 00:##:##:##:##:e7 > 00:##:##:##:##:ff, ethertype IPv4 (0x0800), length 482: <Datanode IP>.12321 > <WitnessIP>.12321: UDP, length 440

Firewall Verification: 

Testing with nc confirmed that UDP port 12321 was blocked, while other ports (for example, 2233) were open

  • Port 12321: Blocked --> no succeeded message

[root@esxihost] nc -u <witness IP> 12321

  • Port 2233 - Open 

[root@esxihost] nc -zv <witness IP> 2233

Connection to <witness ip> 2233 port [tcp/*] succeeded!

Advance parameters

  • Both advanced parameters were found to be disabled, confirming default configuration.

    • /VSAN/IgnoreClusterMemberListupdates 

 [root@esxihost:~] esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListupdates
Value of IgnoreClusterMemberListUpdates is 0

    • /VSAN/DOMPauseAllCCPs 

[root@esxihost:~] esxcfg-advcfg -g /VSAN/DOMPauseAllCCPs
Value of DOMPauseAllCCPs is 0

Environment

VMware vSAN 8.x

Cause

Traffic over UDP port 12321, which is used by the vSAN Cluster Monitoring, Membership, and Directory Service (CMMDS) process, was blocked or filtered by a firewall or network security policy.

This blockage prevented heartbeat communication between the data nodes and the witness appliance, resulting in a cluster partition.

Resolution

  • Engage the network/firewall team to verify connectivity between vSAN data nodes and the witness appliance.
  • Ensure bidirectional communication is permitted on UDP port 12321 across all stretched cluster sites.
  • Once the port is open and communication is restored, vSAN cluster health should automatically recover, and witness connectivity will be re-established.

Additional Information

This issue can occur when network security devices, such as firewalls or traffic filters, interrupt or misroute vSAN communication over the required UDP ports.
It is recommended to include UDP 12321 in the allowed ports list for vSAN environments to prevent similar partition scenarios.

To understand more on ports required for vSAN, refer vSAN ports