VMs unresponsive on vSAN Stretched cluster after witness added or changed
search cancel

VMs unresponsive on vSAN Stretched cluster after witness added or changed

book

Article ID: 393969

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

When adding or replacing a witness in a vSAN stretched cluster you see the following alarm in regards to the witness: 

Conversely, VMs may become unresponsive. 

Environment

VMware vSAN (All Versions)

Cause

In a stretched cluster in the event of a MTU mismatch this can cause the witness to jump in and out of the cluster, which can result in VMs becoming unresponsive until the issue is corrected. 

Resolution

To fix this issue, ensure the witness networking configurations are in line with cluster needs before adding the witness or changing to it.

To verity MTUs use the vmkping command with the -d so the ping is not fragmented.

vmkping -I vmkx <ip address> -s 8972 -d


Example: 

[root@######] vmkping -I vmk1 ###.###.#.## -s 8972 -d
PING ###.###.#.## (###.###.#.##): 8972 data bytes
8980 bytes from 192.168.7.13: icmp_seq=0 ttl=64 time=0.210 ms
8980 bytes from 192.168.7.13: icmp_seq=1 ttl=64 time=0.346 ms
8980 bytes from 192.168.7.13: icmp_seq=2 ttl=64 time=0.175 ms

For more information please see: vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check)


To avoid this happening it is also recommended to ensure vSAN witness traffic is separated from vSAN Data traffic if running 6.7 or newer. Please see: Understanding Mixed MTU support in Stretched & 2 Node vSAN 6.7 U1 clusters with Witness Traffic Separation 

If witness traffic is separated, mixed MTUs on the witness is supported. 

This can be accomplished by tagging a separate vmkernel for witness traffic. (For example, the management vmkernel, or a newly created vmkernel port.) 

This has to be done via esxcli and only needs to be done on the data nodes, not the witness.

 
(Note: It is only recommended to implement this during a maintenance window, with the witness in maintenance mode, or prior to adding the witness into the cluster.)

To tag the vmkernel for witness traffic use the following command:

esxcli vsan network ip add -i vmkx -T witness 

To verify the vmkernel port  is tagged appropriately use the following command:

esxcli vsan network list

Example:

[root@esxi01:~] esxcli vsan network list

Interface
   VmkNic Name: vmk0
   IP Protocol: IP
   Interface UUID: 8cf3ec57-####-####-####-#############
   Agent Group Multicast Address: ###.#.#.#
   Agent Group IPv6 Multicast Address: ####::#:#:#
   Agent Group Multicast Port: 23451
   Master Group Multicast Address: ###.###.#.##
   Master Group IPv6 Multicast Address: ff19::1:2:3
   Master Group Multicast Port: 12345
   Host Unicast Channel Bound Port: 12321
   Multicast TTL: 5
   Traffic Type: witness

Interface
   VmkNic Name: vmk1
   IP Protocol: IP
   Interface UUID: 6df3ec57-####-####-####-#############
   Agent Group Multicast Address: ###.#.#.#
   Agent Group IPv6 Multicast Address: ####::#:#:#
   Agent Group Multicast Port: 23451
   Master Group Multicast Address: ###.###.#.##
   Master Group IPv6 Multicast Address: ff19::1:2:3
   Master Group Multicast Port: 12345
   Host Unicast Channel Bound Port: 12321
   Multicast TTL: 5
   Traffic Type: vsan