Troubleshooting packet drops between VMs
search cancel

Troubleshooting packet drops between VMs

book

Article ID: 374588

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware NSX

Issue/Introduction

This article is to assist with troubleshooting when there are VMs running on ESXi, and the VMs seem to be dropping packets or report packet drops in their statistics.

Environment

VMware vSphere ESXi

VMware NSX-T 3.x

VMware NSX 4.x

Cause

VMs connected to an NSX segment or a Distributed Switch DVPortGroup can drop packets for several different reasons.

 

Some of these reasons include:
1. CPU contention on ESXi

VMs running on ESXi hosts process network packets using CPU cycles. When an ESXi host is oversubscribed for CPU, the VM may experience some delay in getting access to CPU cycles. This may inhibit it's ability to process packets as quickly as they come in, leading to packet drops and out of buffer conditions. 

2. Storage latency on ESXi

Both the hypervisor and VMs running on it may experience adverse effects of increased storage latency on the datastores/LUNs where VMs are hosted. 

3. Packet drops caused by the NSX vDefend Distributed Firewall

While these may be expected packet drops (depending on firewall rule configuration), it is important to note that the firewall is a part of the datapath, that can cause packet drops, and evaluating the appropriate firewall rules is part of any such investigation.

4. Packet drops associated with the physical NIC adapter on the ESXi host

The physical NIC adapter of the ESXi host plays an important role in the datapath and can sometimes drop packets.

Resolution

Troubleshooting these packet drops involves looking at the entire datapath between the two VMs between which packet drops are occurring. Identify any pair of IPs or VMs between which packet drops occur, and then identify all the points of the datapath including L2 handoffs and L3 or higher hops.

 

Tools such as traceflow in NSX or traceflow in vRNI are useful to identify the datapath, and in some case even identify the cause of drops. If these tools do not identify the cause of drops, then performing packet captures simultaneously at a couple of different points will help identify if packets are making it across, in both directions.

 

To identify CPU contention on ESXi, run the utility esxtop via CLI. Additionally, pressing the "d" or the "u" keys in esxtop will help you identify storage latency. For more on identifying storage latency, see this link:

To identify if the firewall is dropping packets, refer to the Security chapter of the NSX Administration Guide here:

 

Physical NIC adapters on ESXi hosts report packet drops via their drivers. Login to ESXi hosts via CLI (SSH) and run the command:
esxcli network nic stats get -n vmnicX