Packet loss for Guest VLAN Tagged traffic with Intel card with ENS and VMDQ features
search cancel

Packet loss for Guest VLAN Tagged traffic with Intel card with ENS and VMDQ features

book

Article ID: 402126

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESX 7.x

Issue/Introduction

Constant packet loss occurs when you have the following condition

  • A VM with 802.1Q enabled (Guest VLAN Tagging Portgroup or Trunked Segment)   
  • ENS mode is used (so the application running on the VM most likely uses DPDK in order to be optimised ) 
  • NIC driver with VMDQ loopback feature in Intel NIC is used (e.g, Intel X710, E810, XXV710). The issue does not happen in other vendors' ENS driver (i.e. Mellanox)
  • The application on the VM performs changes on the VLAN Tag  

This is a 3rd party (Intel/OEM) driver issue.

Environment

  • VMware vSphere ESXi 7.0
  • VMware NSX-T Data Center 3.2.3

Cause

The VMDQ loopback feature used by Intel NIC can cause connectivity issues when we have traffic that uses different VLAN for communication (in the specific case the outgoing traffic was using a specific VLAN while the incoming traffic was using a different VLAN).  Normally, when you have the same VLAN, the traffic is kept in the same ESXi host. In this case the traffic was using different VLAN and it was supposed to traverse the uplinks, but during the troubleshooting we had 2 issues:

Troubleshoothing steps

  • Capturing from the uplink with the following commands was showing packet going out  
    • pktcap --uplink vmnicX --capture EnsPortReaderRx -w outfile.pcap
    • pktcap --uplink vmnicX --capture EnsPortWriterTx -w outfile.pcap 
    • pktcap --uplink vmnicX --dir 2 outfile.pcap 
  • with a span port endpoint (Suricata and/or a Linux Live ISO) attached to the switch (configured to send mirrored traffic from the port after we disabled all the NVDS uplinks and we left only one active) we observed that the packet was not going out as showed by the ESXi packet commands

Further investigation showed that the VMDQ feature is preventing  this kind of traffic from working properly even if it is disabled. You may have problem disabling this feature with the intnetcli extension for esxcli as outlined by this Dell KB 
https://www.dell.com/support/kbdoc/en-us/000223462/intel-e810-c-adapter-turns-off-vmdq-loopback-via-intnetcli-failed

Also, newer version of the intnetcli  extension may not be available for ESXi 7.0 in the intel website, even if it is mentioned.


Resolution

As mentioned,  turning off the  VMDQ  feature (that is disabled by default with i40en 2.9.2 or later and icen 1.14.2 or later (so you do not  even need to disable it with the  intnetcli extension beforehand) does not seem to solve this specific problem, so the only alternative is not to use 802.X in the vApp encapsulation or a non Intel NIC (if the application has this  specific VLAN requirement) , until the problem is not solved on the driver side: the driver that should prevent this behaviour is still not available.

Additional Information

VMDQ loopback feature is used when SRIOV is enabled, but in this case this was a problem with the ENS code of the driver.