VMs on the same ESXi host as the Edge have North-South connectivity issues while ICMP traffic is not affected
search cancel

VMs on the same ESXi host as the Edge have North-South connectivity issues while ICMP traffic is not affected

book

Article ID: 325069

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • NSX-T Data Center version 3.1.0 or 3.1.1.
  • Edge use different Host Switches (N-VDS) for Overlay and VLAN traffic.
  • Edge uplink profile associated with the Overlay Transport Zone joined Host Switch (N-VDS) uses VLAN 0 (VLAN tagging done on the ESXi vSwitch).
  • Edge VM's vNic used for encapsulated traffic is connected to a NSX VLAN Logical segment on an NVDS/VDS which also hosts overlay workload VMs.
  • VMs placed on this Host and connected to the Host Switch (NVDS/VDS) as this Edge vNIC have North-South TCP/UDP connectivity issues  while ICMP traffic is not affected.


Environment

VMware NSX-T

Cause

The issue is caused by Flow Cache on the ESXi which incorrectly sets the vlan tag on encapsulated packets sent to the Edge residing on the ESXi host. Since the Edge is not expecting the vlan tag, the packets are dropped.

Resolution

This issue is resolved in NSX-T 3.1.2 available at VMware Downloads .

Workaround:
To workaround the issue if you cannot upgrade, apply one of the following workaround (by order of preference):

Workaround A: change the Edge configuration to avoid the issue.
This is the preferred solution.

1. Create a trunk VLAN LS which includes the Edge Overlay VLAN.
2. Change the Edge Uplink Profile to include the Edge Overlay VLAN (instead of VLAN 0).
3. Attach the Edge VTEP vNic to the newly created trunk VLAN LS in step 1.
   
Note: North <> South traffic will be affected while performing the above changes, we recommend performing those change during a maintenance window. To reduce the impact, you can create new Edges with the above configuration and use the "Replace Edge Cluster Member" option.

Workaround B: disable the InterTEP feature.

The InterTEP feature allows to "shortcut" the Overlay communications between ESXi host TEP and Edge TEP when VM are running on the same ESXi host as Edges where the TEP to TEP packets do not have to go out of the ESXi host and can be delivered directly to the Edge TEP which provides performance improvement.

Note: this workaround will not work if the Edge and ESXi host TEP are on the same subnet and the Edge VM connects to the same NVDS/VDS as the workload VMs.

To disable InterTEP feature:

1. Retrieve the current configuration using the following GET API:
GET https://<NSXMgr IP>/api/v1/edge-tuning-configuration
Response Body
{
  "lsp_inter_vtep_enable": true,
  "resource_type": "EdgeTuningParameters",
  "id": "8bbc3064-7165-46ff-b086-2d7fdc4909b0",
  "display_name": "8bbc3064-7165-46ff-b086-2d7fdc4909b0",
  "_create_user": "system",
  "_create_time": 1596160130545,
  "_last_modified_user": "admin",
  "_last_modified_time": 1596161099768,
  "_system_owned": false,
  "_protection": "NOT_PROTECTED",
  "_revision": 2
}


2. Change the configuration using the following PUT API:
PUT https://10.186.207.14/api/v1/edge-tuning-configuration
Request Body
{
  "lsp_inter_vtep_enable": false,
  "resource_type": "EdgeTuningParameters",
  "_revision": 2
}


Workaround C: disable ESXi Flow Cache on all the ESXi hosts where Edge VMs are running.

Command to disable ESXi Flow Cache: #nsxdp-cli fc disable

To disable Flow Cache persistently across reboots, additionally go to /etc/vmware/nsx/nsx-cfgagent.xml on each ESXi and make the following change:

  <flowCache>
     <enabled>false</enabled>
     <mcastEnabled>false</mcastEnabled>
  </flowCache>


Note: disabling Flow has a performance impact as it provides between 20-30% throughput improvement.