[root@esxi1:~] vmkping -I vmk2 -d -s 8972 10.X.X.10;PING 10.X.X.10 (10.X.X.10): 8972 data bytes
--- 10.X.X.10 ping statistics ---3 packets transmitted, 0 packets received, 100% packet loss
In above example the vSAN node is configured with MTU 9000 hence we are using MTU value of 8972 for vmkping to test network connectivity with another vSAN node.
Refer KB # vSAN skyline health reports errors:vSAN: MTU check (ping with large packet size)
esxcli vsan cluster get' Output where the member count is in reduced value when compared to actual member count value of the vSAN Cluster. esxi1# esxcli vsan cluster getCluster Information Enabled: true Current Local Time: 2019-09-03T07:02:40Z Local Node UUID: ########-####-####-####-########7c0e Local Node Type: NORMAL Local Node State: MASTER Local Node Health State: HEALTHY Sub-Cluster Master UUID: ########-####-####-####-########7c0e Sub-Cluster Backup UUID: Sub-Cluster UUID: ########-####-####-####-########2f67 Sub-Cluster Membership Entry Revision: 0 Sub-Cluster Member Count: 1 Sub-Cluster Member UUIDs: ########-####-####-####-########7c0e Sub-Cluster Member HostNames: NODE2 Sub-Cluster Membership UUID: ########-####-####-####-########7c0e Unicast Mode Enabled: true Maintenance Mode State: OFF Config Generation: ########-####-####-####-########9e01 12 2019-08-19T09:12:12.1VMware vSAN (All Versions)
Step 1: Verify Network Health
If the underlying network issue affecting vSAN nodes is resolved, both the vSAN cluster partition and MTU-related issues should automatically clear without further intervention.
Step 2: Proceed with Additional Checks (if issue persists)
If the network problem is not yet resolved, continue with the following checks:
1: vmkping test to check vSAN network connectivity issue.
Packets get dropped upon ping to VSAN vmkernel.
NODE2# vmkping -I vmk2 192.168.x.xxx -c 1000
PING 192.168.x.xxx (192.168.x.xxx): 56 data bytes
64 bytes from 192.168.x.xxx: icmp_seq=2 ttl=64 time=0.133 ms
64 bytes from 192.168.x.xxx: icmp_seq=3 ttl=64 time=0.111 ms
64 bytes from 192.168.x.xxx: icmp_seq=4 ttl=64 time=0.129 ms
64 bytes from 192.168.x.xxx: icmp_seq=5 ttl=64 time=0.133 ms
64 bytes from 192.168.x.xxx: icmp_seq=6 ttl=64 time=0.137 ms
64 bytes from 192.168.x.xxx: icmp_seq=7 ttl=64 time=0.140 ms
64 bytes from 192.168.x.xxx: icmp_seq=8 ttl=64 time=0.141 ms
64 bytes from 192.168.x.xxx: icmp_seq=9 ttl=64 time=0.127 ms
64 bytes from 192.168.x.xxx: icmp_seq=10 ttl=64 time=0.139 ms
64 bytes from 192.168.x.xxx: icmp_seq=11 ttl=64 time=0.087 ms<======= Sequence missed
64 bytes from 192.168.x.xxx: icmp_seq=37 ttl=64 time=0.137 ms<======= Sequence missed
64 bytes from 192.168.x.xxx: icmp_seq=38 ttl=64 time=0.151 ms
2: Packet Capture Analysis to validate if there is any underlying network issue.
Packet capture shows UDP traffic is working but We have seen the "sequence 11 is followed by sequence 37"
# pktcap-uw --uplink vmnic4 --dir 0 --stage 1 --proto 0x11 -o -| tcpdump-uw -r - -nne >> Run this command on one of the data node where uplink 4 is used for vSAN vmkernel.
----- Output of the above command is as below -----
The Stage is Post.
The session filter IP protocol is 0x11.
pktcap: The output file is -.
pktcap: No server port specifed, select 21248 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 21248.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Accept...
pktcap: Vsock connection from port 1029 cid 2.
07:39:15.068063 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 178: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 136
07:39:16.068090 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 178: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 136
07:39:17.068136 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 258: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 216
07:39:17.068162 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 186: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 144
07:39:18.068157 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 258: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 216
07:39:18.068186 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 186: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 144
07:39:19.068208 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:20.068203 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:21.068238 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:22.068288 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:23.068326 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:24.068347 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:25.068365 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:26.068417 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 242: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 200
07:39:27.068432 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 466: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 424
07:39:28.068511 ##:##:##:##:##:90 > ##:##:##:##:##:93, ethertype IPv4 (0x0800), length 466: 192.168.x.xxx.12321 > 192.168.x.xxx.12321: UDP, length 424
The same packet capture with ICMP filter shows more drops:
# pktcap-uw --uplink vmnic5 --dir 0 --stage 0 --proto 0x01 -o -|tcpdump-uw -r - -nne
The name of the uplink is vmnic5.
The Stage is Pre.
The session filter IP protocol is 0x01.
pktcap: The output file is -.
pktcap: No server port specifed, select 42606 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 42606.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Accept...
pktcap: Vsock connection from port 1026 cid 2.
07:45:06.559790 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 98, length 64
07:45:07.561992 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 99, length 64
07:45:08.562521 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 100, length 64
07:45:09.564725 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 101, length 64
07:45:10.566928 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 102, length 64
07:45:11.569107 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 103, length 64
07:45:27.598571 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 119, length 64 <======== show sequence missed again.
07:45:28.600526 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 120, length 64
07:45:29.602738 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 121, length 64
07:45:30.604959 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 122, length 64
07:45:31.607195 ##:##:##:##:##:93 > ##:##:##:##:##:90, ethertype IPv4 (0x0800), length 98: 192.168.x.xxx > 192.168.x.xxx: ICMP echo request, id 36438, seq 123, length 64
NODE2# esxcli network ip interface ipv4 get
Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type Gateway DHCP DNS
---- ------------- --------------- -------------- ------------ ------------- --------
vmk0 10.12.xxx.xxx 255.xxx.xxx.xxx 10.12.xxx.255 STATIC 10.12.xxx.xxx false
vmk2 192.168.x.xxx 255.xxx.xxx.x 192.168.xxx.255 STATIC 0.0.0.0 false
vmk3 192.168.x.xx 255.xxx.xxx.0 192.168.xxx.255 STATIC 0.0.0.0 false
NODE1# esxcli network ip interface ipv4 get
Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type Gateway DHCP DNS
---- ------------- --------------- -------------- ------------ ------------- --------
vmk0 10.12.xxx.xxx 255.255.xxx.xxx 10.xx.xxx.255 STATIC 10.12.xxx.xxx false
vmk2 192.168.x.xxx 255.255.xxx.xxx 192.168.x.xxx STATIC 0.0.0.0 false
vmk3 192.168.x.xx 255.255.xxx.xxx 192.168.x.xxx STATIC 0.0.0.0 false
Isolating 1 NIC shows 100 % packet loss:
NODE2# esxcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------ ------------ ------- ------------ ----------- ----- ------ ----------------- ---- -----------------------------------------------------------------
vmnic0 0000:18:##.## ntg3 Up Up 1000 Full ##:##:##:##:##:0c 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic1 0000:18:##.## ntg3 Up Down 0 Half ##:##:##:##:##:0d 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic2 0000:19:##.## ntg3 Up Up 1000 Full ##:##:##:##:##:0e 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic3 0000:19:##.## ntg3 Up Down 0 Half ##:##:##:##:##:0f 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic4 0000:87:##.## qedentv Down Down 0 Half ##:##:##:##:##:2c 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic5 0000:87:##.## qedentv Up Up 10000 Full ##:##:##:##:##:2d 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
NODE2#vmkping -I vmk2 192.168.x.xxx -c 100 -i 0.005
PING 192.168.x.xxx (192.168.x.xxx): 56 data bytes
--- 192.168.x.xxx ping statistics ---
100 packets transmitted, 0 packets received, 100% packet loss
Bringing up other NIC and making faulty down show packet is not lost by verifying it on esxtop command and selecting option "n" to see association between NIC and vmkernel port.
NODE2# esxcli network nic up -n vmnic4
NODE2#esxcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------ ------------ ------- ------------ ----------- ----- ------ ----------------- ---- -----------------------------------------------------------------
vmnic0 0000:18:##.# ntg3 Up Up 1000 Full ##:##:##:##:##:0c 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic1 0000:18:##.# ntg3 Up Down 0 Half ##:##:##:##:##:0d 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic2 0000:19:##.# ntg3 Up Up 1000 Full ##:##:##:##:##:0e 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic3 0000:19:##.# ntg3 Up Down 0 Half ##:##:##:##:##:0f 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic4 0000:87:##.# qedentv Up Up 10000 Full ##:##:##:##:##:2c 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic5 0000:87:##.# qedentv Up Up 10000 Full ##:##:##:##:##:2d 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
NODE2#esxcli network nic down -n vmnic5
NODE02# vmkping -I vmk2 192.xxx.x.xxx-c 100 -i 0.005
PING 192.168.x.xxx (192.xxx.x.xxx): 56 data bytes
64 bytes from 192.168.x.xxx: icmp_seq=0 ttl=64 time=0.148 ms
64 bytes from 192.168.x.xxx: icmp_seq=1 ttl=64 time=0.069 ms
64 bytes from 192.168.x.xxx: icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from 192.168.x.xxx: icmp_seq=3 ttl=64 time=0.072 ms
64 bytes from 192.168.x.xxx: icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from 192.168.x.xxx: icmp_seq=5 ttl=64 time=0.061 ms
NIC was using latest driver.
NODE2#vmkload_mod -s qedentv
vmkload_mod module information
input file: /usr/lib/vmware/vmkmod/qedentv
Version: 3.9.31.2-1OEM.670.0.0.8169922
Build Type: release
License: QLogic_Proprietary
Required name-spaces:
com.vmware.vmkapi#v2_5_0_0
Parameters:
Refer KB: Determining Network/Storage firmware and driver version in ESXi
3. If the vSAN network IP addresses are in different subnets, configure manual static routes in the routing table using the below kb article:
Configuring static routes for vmkernel ports on an ESXi host
Refer KB # Network adapter (vmnic) is down or fails with a Failed Criteria Code