To determine which NIC is dropping packets:
- Click on one of the hosts shown under the error message to take a look at performance metrics specific to that host. Repeat for all the hosts that show up.
- Click on Show All Metrics. This takes you to the performance details of the host backend.
- Click on Physical Adapters to look through those that are using vSAN. Check the graphs that describe the packet loss rate.
What is the impact of this issue?There is no functional impact on vSAN; it uses TCP, which recovers from packet losses by retransmitting lost segments. However, this process causes application latency—specifically, it is likely that the average and standard deviation of latency that your application sees is higher due to dropped packets. To get the best performance from your vSAN cluster, you should fix the issue.
What are possible ways to fix this issue?A physical NIC drops packets for several reasons. Please look through the following issues and apply the remediation steps. If the issue persists, contact VMware support for a resolution.
Here is a list of possible issues:
- Incompatible firmware and driver versions. Please check that the NIC firmware and driver versions are compliant. If possible, update to the latest supported version of the firmware and device driver.
- Incorrectly set MTU. An incompatible MTU between all network ports, such as the NIC port, switch, and other NIC ports on other hosts in this cluster, could cause intermittent packet drops. Please check that the MTU is uniform across all the ports in the network. The default MTU is 1500. If a NIC is dedicated to vSAN, we recommend using jumbo frames with an MTU of 9000.
- TCP segment offload (TSO) and large receive offload (LRO). If you do not have TSO and LRO enabled, you may not be able to reach the line rate supported by your NIC. We recommend enabling TSO and LRO for all NIC ports dedicated to vSAN traffic. Please refer to KB 2055140 on how to enable TSO and LRO.
- NIC teaming policy. There are a number of options available to the reader with regards to NIC teaming. Some NIC teaming policies require physical switch configurations, and have a dependency on switchgear quality. Some policies also required an understanding of networking (such as Link Aggregation). Unless you really comfortable and experienced with network switch configuration, VMware recommends avoiding these policies. If you have any doubts on which one to choose, choosing a basic NIC team of an active/standby configuration with explicit failover is a good place to start.
- NIC teaming policy. Multiple VMkernel adapters may be used when configuring vSAN networks. This is typically a configuration when customers wish to implement an “air-gap” in their vSAN networking. An air-gap means that a failure that occurs on one network path does not impact the other network path. Therefore, any part of one network path can fail, and the remaining network path can carry the traffic. This configuration is achieved by configuring multiple vSAN enabled VMkernel NICs
- NIC teaming failback: If you are employing NIC teaming, packet drops can occur during NIC teaming failback. KB 1014325 discusses this issue at length.