vSAN Witness appliance partitioned from the stretched cluster

Products

VMware vSAN

Issue/Introduction

Witness appliance experiences some type of failure or network partition and then will not re-join the cluster, even when disabling/re-enabling the cluster.
VSAN health of the cluster is showing 51%

Environment

VMware vSAN

Resolution

First verify the witness is on the exact same version of ESXi as the data nodes. If it isn't, update the node using your standard practices for upgrading nodes to match the data nodes: Upgrading ESXi Hosts. If necessary re-deploy a new witness on the exact build of the data nodes.
Check for duplicate IP on witness appliance:
- Shut down witness appliance and ping witness appliance IP to confirm no other device or system in the environment is already using the same IP.
  - vmkping -I vmkX x.x.x.x

Then verify that there are no actual communication issues between the data nodes and the witness appliance before proceeding. (See Testing VMkernel network connectivity with the vmkping command (1003728) for more details on using vmkping)
Note: Only the Primary Node and the Backup will be reaching out to the witness over port 12321.

Verify connectivity by running the below vmkping tests between the vSAN vmks as the witness node or any node for that matter can't join the cluster if the packets are fragmented.
To test 1500 MTU, run the command: vmkping -I vmkX x.x.x.x -d -s 1472
To test 9000 MTU, run the command: vmkping -I vmkX x.x.x.x -d -s 8972

In addition, check the connectivity between the witness appliance and the data nodes via port 12321.

On the witness: tcpdump-uw -i vmkX
(vmkx is the vmk port that is used for vSAN traffic)

If there is connectivity you will see incoming requests and responses over port 12321 to both the Primary Node and the Backup.

Example:(Note: Also verify it is reaching out to the correct ip/fqdn)

On the Primary/Backup node: tcpdump-uw -i vmkX | grep <witness IP/FQDN>
(vmkX is the vmk for witness traffic)

If working correctly, you will see it reaching out over port 12321 and also the response.

Example:

If there is no connectivity over port 12321, get this resolved first. If there is connectivity proceed.

The traffic on port 12321 is required to be enabled bidirectionally for the vSAN Cluster Monitoring, Membership, and Directory Service (CMMDS) to exchange the CMMDS heartbeats for all types of vSAN cluster architecture.

Please Note: Check if there is any security device (Firewall) or network filter, for any UDP drop session traffic on vSAN <--> witness by ports: 12321/2233.

If there is any UDP flow session disrupted found on the firewall, this session needs to be manually flushed.

In cases where there's a disruption in the transit path, existing UDP sessions may enter a Discard state before reaching timeout, becoming stale sessions. Sessions in the discard state will continue to be refreshed and discard traffic if incoming traffic matches the discard session. After clearing the existing session and creating a new UDP session in the firewall, the UDP stream will restart functioning bidirectionally.

example A:

A common behavior in some firewalls with UDP sessions, particularly when the application uses the same port for both the source and destination systems. To permanently address this issue, try to reduce the default session timeout in the firewall.

Output collected from firewall logs:

xxxxx unknown-udp DISCARD FLOW 10.xxx.xxx.xxxx[12321]/Shell/17 (10.xxx.xxx.xxxx[12321])
xxxxx 10.xxx.xxx.xxxx[12321]/Untrusted (10.xxx.xxx.xxxx[12321])

Impact/Risks:
Before disabling the stretched cluster, always confirm that all other fault domains are up and accessible.

Once this is verified and the firewall is not dropping sessions, blocking port communication or there is no firewall in use between data sites and witness, follow these steps:

1. Put the witness appliance in maintenance mode with Ensure Accessibility.

2. Disable the stretched cluster in the GUI. Configure > VSAN > Fault Domains and Stretched Clusters.

3. SSH Into the witness appliance and manually dismantle the disk group. (See How to manually remove and recreate a vSAN disk group using esxcli (2150567) for details on this process)

IMPORTANT: Before dismantling any disk group, ensure you are on the correct host and targeting the correct disk group. Maintenance Mode with Ensure Accessibility is recommended.

  esxcli vsan storage remove -u <VSAN Disk Group UUID>
                        or
  esxcli vsan storage remove -s <VSAN Disk Group Cache Identifier>

For ESA enabled clusters:

  esxcli vsan storagepool remove -u <VSAN Device UUID>
                        or
  esxcli vsan storagepool remove -d <VSAN Device ID>

4. Re-enable the stretched cluster and follow the Wizard and have it create new disks to house the witness components.

This should re-form the cluster successfully and allow the witness components to re-build on the newly created virtual disks. If this fails, then you may need to re-deploy the witness appliance.

Sometimes user might accidentally configure 'witness' traffic type on the vmkernel adapter of Witness Appliance itself, please note that for the WTS (Witness Traffic Separation) setup, this tag is to be used on the data nodes only. 'Witness' traffic type should be removed from the vmkernel adapter (vmk) if found to be present on a vmkernel adapter of the Witness Appliance.

5. If the above networking issue is still not fixed then deploy new the vSAN witness appliance then replace the same.

Refer below documents for more details.

Additional Information

Troubleshooting network and TCP/UDP port connectivity issues on Hosts

Required vSAN ports