This article is to assist with troubleshooting when there are VMs running on ESXi, and the VMs seem to be dropping packets or report packet drops in their statistics.
VMware vSphere ESXi
VMware NSX-T 3.x
VMware NSX 4.x
VMs connected to an NSX segment or a Distributed Switch DVPortGroup can drop packets for several different reasons.
Some of these reasons include:
1. CPU contention on ESXi
VMs running on ESXi hosts process network packets using CPU cycles. When an ESXi host is oversubscribed for CPU, the VM may experience some delay in getting access to CPU cycles. This may inhibit it's ability to process packets as quickly as they come in, leading to packet drops and out of buffer conditions.
2. Storage latency on ESXi
Both the hypervisor and VMs running on it may experience adverse effects of increased storage latency on the datastores/LUNs where VMs are hosted.
3. Packet drops caused by the NSX vDefend Distributed Firewall
While these may be expected packet drops (depending on firewall rule configuration), it is important to note that the firewall is a part of the datapath, that can cause packet drops, and evaluating the appropriate firewall rules is part of any such investigation.
4. Packet drops associated with the physical NIC adapter on the ESXi host
The physical NIC adapter of the ESXi host plays an important role in the datapath and can sometimes drop packets.
Troubleshooting these packet drops involves looking at the entire datapath between the two VMs between which packet drops are occurring. Identify any pair of IPs or VMs between which packet drops occur, and then identify all the points of the datapath including L2 handoffs and L3 or higher hops.
Tools such as traceflow in NSX or traceflow in vRNI are useful to identify the datapath, and in some case even identify the cause of drops. If these tools do not identify the cause of drops, then performing packet captures simultaneously at a couple of different points will help identify if packets are making it across, in both directions.
To identify CPU contention on ESXi, run the utility esxtop via CLI. Additionally, pressing the "d" or the "u" keys in esxtop will help you identify storage latency. For more on identifying storage latency, see this link:
To identify if the firewall is dropping packets, refer to the Security chapter of the NSX Administration Guide here:
Physical NIC adapters on ESXi hosts report packet drops via their drivers. Login to ESXi hosts via CLI (SSH) and run the command:
esxcli network nic stats get -n vmnicX