During patching ESXi hosts, vSAN cluster may report cluster partition and network issues.
search cancel

During patching ESXi hosts, vSAN cluster may report cluster partition and network issues.

book

Article ID: 391620

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms :

During the host upgrade on vSAN cluster hosts, virtual machines may report inaccessible.

This would also show up vSAN cluster hosts in network partition state.

The issue is observed following the firmware upgrade on multiple hosts, impacting multiple virtual machine accessibility.

Environment

VMware vSAN 7.x

Cause

The issue is seen when there is latency on hosts beyond 5 ms between data nodes.

Few hosts would have healthy bandwidth and latency and few hosts may show up latency.

This can be validated using the vmkping commands as below. 

A ping test from Host Esx1 to other hosts revealed latency and packet loss.

[root@esx1:~] for i in `localcli vsan cluster unicastagent list | grep true | awk '{ print $4}'`; do echo "pinging $i 3 times"; echo; vmkping -I vmk1  $i -s  1472  -d -c 1000 -i .005   ;

pinging 10.##.###.### 3 times
--
1480 bytes from 10.##.###.### : icmp_seq=997 ttl=64 time=683.752 ms
1480 bytes from 10.##.###.### : icmp_seq=998 ttl=64 time=677.045 ms
1480 bytes from 10.##.###.### : icmp_seq=999 ttl=64 time=671.718 ms

--- 10.##.###.### ping statistics ---
1000 packets transmitted, 969 packets received, 3.1% packet loss
round-trip min/avg/max = 268.792/596.461/878.957 ms 

Resolution

As this indicate an issue with the physical networking, it needs investigation from physical networking layer, recommended to work with network team.

Alternatively, we can swap the vmnic for the vSAN traffic by bringing down the active network card which may be experiencing and reporting the latency.

Please refer Troubleshooting the vSAN Network more information.