Symptoms :
During the host upgrade on vSAN cluster hosts, virtual machines may report inaccessible.
This would also show up vSAN cluster hosts in network partition state.
The issue is observed following the firmware upgrade on multiple hosts, impacting multiple virtual machine accessibility.
VMware vSAN 7.x
The issue is seen when there is latency on hosts beyond 5 ms between data nodes.
Few hosts would have healthy bandwidth and latency and few hosts may show up latency.
This can be validated using the vmkping commands as below.
A ping test from Host Esx1 to other hosts revealed latency and packet loss.
[root@esx1:~] for i in `localcli vsan cluster unicastagent list | grep true | awk '{ print $4}'`; do echo "pinging $i 3 times"; echo; vmkping -I vmk1 $i -s 1472 -d -c 1000 -i .005 ;
pinging 10.##.###.### 3 times--1480 bytes from 10.##.###.### : icmp_seq=997 ttl=64 time=683.752 ms1480 bytes from 10.##.###.### : icmp_seq=998 ttl=64 time=677.045 ms1480 bytes from 10.##.###.### : icmp_seq=999 ttl=64 time=671.718 ms
--- 10.##.###.### ping statistics ---1000 packets transmitted, 969 packets received, 3.1% packet lossround-trip min/avg/max = 268.792/596.461/878.957 ms
As this indicate an issue with the physical networking, it needs investigation from physical networking layer, recommended to work with network team.
Alternatively, we can swap the vmnic for the vSAN traffic by bringing down the active network card which may be experiencing and reporting the latency.
Please refer Troubleshooting the vSAN Network more information.