vSAN cluster partition after physical switch replacement.
search cancel

vSAN cluster partition after physical switch replacement.

book

Article ID: 426643

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • vSAN cluster hosts show in partition state (different groups) post changing the physical switches used for the vSAN networking purpose.

  • vSAN skyline health may show hosts in different groups. You may see the hosts in different groups in partition.

Environment

  • VMware vSAN 8.x
  • VMware vSAN 9.x

Cause

  • The issue would occur when the MTU is not correctly or fully configured on new switches.

  • Upon verification you may see hosts are able to communicate with MTU 1500 with neighbor hosts. This confirms that the ports are configured with the VLANs.

  • However, upon checking for the MTU 9000 which is configured with vSAN vmk ports, the communication would fail for one or more hosts, but may not be for all the hosts. This confirms that there is an issue with MTU configuration on new switches.

  • You may refer the below snippets upon pinging with jumbo frames (MTU 9000), the ping would be successful for few hosts and would fail for few hosts.

------------------------------------------------------------
[root@host :~ ] vmkping -I vmk1 10.##.##.23 -8-1472 -d -c 3
PING 10.##.##.23 (10.##.##.23): 1472 data bytes
1480 bytes from 10.##.##.23: icmp_aeq-0 ttl-64 time-0.196 ms
1480 bytes from 10.##.##.23: icmp seq-1 ttl-64 time-0.306 ms
1480 byces from 10.##.##.23: icmp_seq=2 ttl=64 time=0.336 mg

10.##.##.23 ping statistics -
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.196/0.279/0.336 m
------------------------------------------------------------

[root@host :~ ] vmkping -I vmk1 10.##.##.23 -s 8972 -d -c 3
PING 10.##.##.23 (10.##.##.23) : 8972 data byces
8980 bytes from 10.##.##.23: 1cmp seq=0 ttl=64 time=0.424 mg
8980 bytes from 10.##.##.23: icmp_seq-1 ttl-64 time-0.218 ms
B980 byces from 10.##.##.23: icmp seq-2 tel=64 time=0.392 mg

10.##.##.23 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.218/0.345/0.424 mg
------------------------------------------------------------

[root@host :- ] vmkping -I vmk1 10.##.##.20 -s 8972 -d -c 3
PING 10.##.##.20 (10.##.##.20): 8972 data bytes

10.##.##.20 ping statiatics
3 packets transmitted, 0 packets received, 100% packet loss

------------------------------------------------------------
[root@host :~ ] vmkping -I vmk1 10.##.##.20 -s 1472 -d -c 3
PING 10.##.##.20 (10.##.##.20) : 1472 data bytes
1480 bytes from 10.##.##.20: icmp_seq-0 ttl-64 time-0.385 ms
1480 bytes from 10.##.##.20: icmp_seq-1 ttl-64 time-0.301 ms
1480 byces from 10.##.##.20: 1cmp_seq=2 tol=64 time=0.300 ms

--- 10.##.##.20 ping statistics ---
3 packets transmitted, 3 packets received, 0t packet loss
round-trip min/avg/max = 0.300/0.329/0.385 ma

------------------------------------------------------------
[root@host :~ ] vmkping -I vmk1 10.##.##.26 -s 8972 -d -c 3
PING 10.##.##.20 (10.##.##.26) : 8972 data bytes
10.##.##.26 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@host :~ ]
------------------------------------------------------------

Resolution

  1. Verify if the hosts are hosted to one or more server racks and verify if the top of the rack switches are standalone for the hosts on each server rack.

  2. When changing all the switches, please ensure to have the MTU configured at all the levels TOR switch as well as mediator switches with required size.

  3. All the switches that are part of vSAN network communication should allow the MTU configured on the vSAN vmk ports.