Troubleshooting isolated the issue to network port accessibility. While `vmkping` confirmed basic Layer 3 reachability and port 12321 was confirmed open, complete vSAN communication was not occurring.
VMware vSAN 8.0.x
The root cause is network-level blocking of essential vSAN ports. Specifically, ports 23451 and 12345 are not open on the network path between the vSAN data nodes and the witness host.
vSAN unicast communication requires multiple distinct ports for full functionality. Port 12345 is required for the vSAN Clustering Service (CMMDS) to maintain cluster membership, and port 23451 is required for Reliable Datagram Transport (RDT) for vSAN data and metadata transfer. Even with ICMP and port 12321 open, the blockage of 12345 and 23451 prevents the witness from exchanging cluster metadata, resulting in the network partition state.
1. Engage the local network/firewall administration team to open TCP and UDP ports 12345 and 23451 bidirectionally between all vSAN data nodes and the vSAN witness node.
2. Once the network team confirms the ports are open, verify port reachability from the ESXi hosts using the `nc` command:
nc -u -z <witness-node-IP> 12321
nc -u -z <witness-node-IP> 12345
nc -u -z <witness-node-IP> 23451
3. Monitor the vSAN Health UI. The witness node should automatically rejoin the subcluster once communication is restored.
Opening ports 12345 and 23451 restores the required network pathways for CMMDS and RDT. This allows the witness node to successfully participate in cluster master/agent elections and synchronize object metadata, resolving the network partition and restoring the stretched cluster to a healthy state.
Ref - https://ports.broadcom.com/home/vSAN
For more information about vSAN Cluster Partition - Witness partitioned