The Maximum Transmission Unit (MTU) health check, also called "MTU check (ping with large packet size)" complements the basic connectivity check for vMotion traffic.
The MTU Check warning can be caused due to mismatch of MTU between the vSphere environment and Physical Switch.
Below error is notice on skyline health.
Cluster > Monitor > Skyline health >
Vmotion task takes longer time or fails.
Error skyline health:
Identify vmkernel, vmnic used for vMotion traffic and its MTU size configured.
Make sure the MTU is consistently configured across the cluster.
Identify the vMotion network on the ESXi host.
[root@esxi2:~] esxcfg-vmknic -l | grep vMotionvmk1 vMotion IPv4 ###.###.##.### 255.255.255.0 ###.###.#.### 00:50:56:##:##:## 1500 65535 true STATIC defaultTcpipStackvmk1 vMotion IPv6 fe80::250:####:####:#### 64 00:50:56:##:##:## 1500 65535 true STATIC, PREFERRED defaultTcpipStackThe above checks indicates that vmk1 is used for vMotion traffic.
Check the MTU setting on the vSwitch.
[root@esxi2:~] esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU UplinksvSwitch0 2520 8 128 1500 vmnic0,vmnic1
PortGroup Name VLAN ID Used Ports Uplinks VM Network 0 0 vmnic0,vmnic1 VMkernel-Test 0 1 vmnic0,vmnic1 vMotion 0 1 vmnic1 Management Network 0 1 vmnic0,vmnic1
Check the MTU on the vmnics:
[root@esxi2:~] esxcfg-nics -lName PCI Driver Link Speed Duplex MAC Address MTU Descriptionvmnic0 0000:0b:00.0 nvmxnet3 Up 10000Mbps Full 00:50:56:##:##:## 1500 VMware Inc. vmxnet3 Virtual Ethernet Controllervmnic1 0000:13:00.0 nvmxnet3 Up 10000Mbps Full 00:50:56:##:##:## 1500 VMware Inc. vmxnet3 Virtual Ethernet Controllervmnic2 0000:1b:00.0 nvmxnet3 Up 10000Mbps Full 00:50:56:##:##:## 1500 VMware Inc. vmxnet3 Virtual Ethernet Controllervmnic3 0000:04:00.0 nvmxnet3 Up 10000Mbps Full 00:50:56:##:##:## 1500 VMware Inc. vmxnet3 Virtual Ethernet ControllerThe above checks indicates that vmnic1 is associated with vmk1 vMotion uplink traffic and the set for 1500 MTU.
2. Check for packet drops by performing the vmkping on the vMotion network using 1472 MTU.
vmkping -I <vmotion_vmk> <vMotion_IP_of_another_host> -d -s 1472 -c 300
Example : vmkping -I vmk1 ###.###.#.### -d -s 1472 -c 100
PING ###.###.#.### (###.###.#.###): 1472 data bytes1480 bytes from ###.###.#.###: icmp_seq=0 ttl=64 time=1.517 ms1480 bytes from ###.###.#.###: icmp_seq=2 ttl=64 time=0.508 ms1480 bytes from ###.###.#.###: icmp_seq=3 ttl=64 time=0.634 ms--- ###.###.#.### ping statistics ---10 packets transmitted, 10 packets received, 75% packet lossround-trip min/avg/max = 0.487/0.685/1.517 ms
VMware ESXi Version: 7.x
VMware ESXi Version: 8.x
VMware vSAN : 7.x
VMware vSAN : 8.x
The CRC (Cyclic Redundancy Check) error occurs when data corruption is detected during transmission over a physical network. It happens when the calculated checksum of the received data does not match the expected value, indicating possible data corruption due to issues like faulty cables, SFP, Network Interface Card (NIC) or SAN Switch.
Run the following command against the vmnic, which is used for vSAN uplink.
$ esxcli network nic stats get -n <vmnic#>NIC statistics for vnnic1Packets received: 5721419054Packets sent: 6897046642Bytes received: 2845905140057Bytes sent: 4960832527165Receive packets dropped: 0Transmit packets dropped: 0Multicast packets received: 133976174Broadcast packets received: 49838376Multicast packets sent: 390456Broadcast packets sent: 39197Total receive errors: 4028Receive length errors: 36Receive over errors: 0Receive CRC errors: 1996Receive frame errors: 0CRC errors captured under var/run/log/hostd.log.
DateTxx:xx:xx.xxxx Wa(164) Hostd[2102792]: [Originator@6876 sub=Statssvc.StatsCollector] Error stats for pnic: vmnic#DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- > errorsRx: 178317DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- > RxLengthErrors: 2001DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- > RxCRCErrors: 88158DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- >DateTxx:xx:xx.xxxx Wa(164)) Hostd[2102792]: [Originator@6876 sub=Statssvc.StatsCollector] Error stats for pnic: vmnic#DateTxx:xx:xx.xxxx Wa(164)) Hostd[2102782]: -- > errorsRx: 178313DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- > RxLengthErrors: 2001DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- > RxCRCErrors: 88156DateTxx:xx:xx.xxxx Wa(164) Hostd[2102782]: -- >
$ watch esxcli network nic stats get -n <vmnic#>
This issue is outside vSphere environment, hence involve Server hardware vendor and SAN switch vendor to replace the defective hardware.
Note: The CRC error count will stop once the defective hardware is replaced. A reboot of ESXi server will reset the CRC error count.
Once the issue has been resolved, rerun the vSAN Health Check tests to confirm that the MTU Check (ping with large packet size) warning is no longer present.