Troubleshooting vSAN Witness Node Isolation
search cancel

Troubleshooting vSAN Witness Node Isolation

book

Article ID: 315546

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • A vSAN Witness node (Virtual or Physical) is isolated.
    To confirm witness node isolation run the command: 

esxcli vsan cluster get


If the output of the command returns:
Sub-Cluster Member Count: 1
Local Node State: STANDALONE

Or 

Sub-Cluster Member Count: 0
Local Node State: Discovery


Then the Witness is confirmed to be isolated from the vSAN Cluster.

  • The vSAN Witness node cannot form a cluster with the remaining vSAN data nodes in a stretched cluster configuration.
  • Pinging the Witness node from a vSAN ESXi host fails.
  • Pinging an ESXi host from a Witness works, but not with a full TCP frame. You can use below vmkping command to test the connectivity : 
    vmkping -I <witness-vmk#> <vsan-IPaddress> -s <icmp-data-size> -d

    Note: -d option is for 'don't fragment' bit on the IPv4 packet. 
    -s is for size. 8972 is the size needed for 9000 MTU and 1472 is the size needed for MTU 1500.



Environment

VMware vSAN
 

Resolution

In a vSAN stretched cluster the Witness plays an important role assuring keeping all the witness components of the vSAN objects available.
 
To ensure proper TCP/IP communication between the data hosts and the Witness, these requirements exist:
  • Round-Trip Time (RTT) latency between the Witness and the ESXi hosts must be <200ms (500ms in ROBO cluster, 100ms if 11-20 nodes per site).
  • A full frame must be sent between pings. If using MTU 1500, the unfragmented payload must be at least 1472 bytes.
 
  • To verify if the payload can be sent, run this command from one of the ESXi hosts: 
    vmkping -I <VSANvmknic> <WitnessIP> -s 1472 -d -c20
    If the ping fails, something on the network is not allowing the full payload to travel between the ESXi and the Witness node.
 
  • Verify the unicast table in the ESXi hosts, by running the following command
    • esxcli vsan cluster unicastagent list 
    • The Witness appears with the value 1 in the "is witness" section 
  • In case that the Witness does not appear in the unicastagent list, we can add it by running the following commands :
    • From the Witness node, esxcli vsan cluster get, collect the local UUID
    • From the ESXi hosts,  esxcli vsan cluster unicastagent add -t witness -u <local_UUID> -U true -a <vSAN IP address> -p 12321
 
  • Verify the vSAN tags with the command 
  • Verify the ESXi version of the Witness is the same build as the rest of the cluster, as version mismatch will prevent the Witness node from joining the cluster.


Recommendation

  • The Management (vmk0) and WitnessPg (vmk1) VMkernel interfaces on the vSAN Witness node must not be configured to use addresses on the same subnet.
    This creates a Multihoming situation, referenced in Article 318546.
    If only a single subnet is available for the vSAN Witness node, it is recommended to untag vSAN traffic on vmk1 and tag vSAN traffic on vmk0 on the vSAN Witness node.

 

Additional Information