Incoming traffic from external network to CNF pods is dropped in Edge nodes and is not forwarded to vTEP of ESXi hosts
search cancel

Incoming traffic from external network to CNF pods is dropped in Edge nodes and is not forwarded to vTEP of ESXi hosts

book

Article ID: 414258

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Incoming TCP and ICMP traffic from external network is dropped in NSX Edge nodes and are not forwarded to CNF pods in the reverse path. While no issue is observed with outgoing packets from CNF pods to external network via Edge nodes. This issue is observed for few pods instantiated on the same worker node inside same ESXi host. 

Environment

4.2.1.4.0

Cause

The reason for the traffic drop is observed to be duplicate ARP reply received from another pod inside a different worker node in a faulty ESXi host in the network.

This incorrect ARP reply consistently updates the mac-address-table of logical-switch with wrong remote vTEP IP. So the incoming packets for some of the pods created with same IP are forwarded to wrong transport nodes. 

To identify is such condition is present in the network, identify  the logical-switch UUID to which the Edge node's overlay or geneve interfaces are connected.

This could be obtained by executing get segment or get logical-switches on the Edge nodes 

Then with the UUID of segment, get the mac-address-table learned by this switch on edge node by executing get segment <segment UUID> mac-address-table 

From the output, validate if the remote vTEP IP is correctly updated with the TEP IP of ESXi host on which the Pod/worker nodes are deployed

Sample output is as below 

MAC         :      xx:xx:xx:xx:xx:xx   (MAC ID of the Pod which is learned by logical-switch on Edge)

  Tunnel    :     UUID of geneve tunnel

  IFUID       :     XXX

  LOCAL      :     vTEP IP of Edge node 

  Remote      :     vTEP IP of remote ESXI host on which Pod is deployed to which the reverse traffic is to be forwarded 

    Encap       :     GENEVE

    Source      :     Dynamic 

 

 

Resolution

Remove the faulty or non-functional ESXi host from the network/cluster/DVS or shutdown the uplinks of the ESXi host until the issue with ESXi host is resolved.

 

Additional Information

Alternatively from the support bundle of the Edge nodes, check if the remote-vtep-ip is updated correctly mac-address of pod in the mac-table in logical-switch file inside edge folder 

Sample output is shown below 

{ 
  "vlan": 4096,
  "mac": "xx:xx:xx:xx:xx:xx",
  "local-vtep-ip": "xxx.xxx.xxx.xx",
  "remote-vtep-ip": "xxx.xxx.xx.x",
  "encap": "GENEVE",
  "ifuuid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx",
  "ifuid": xxx,
  "source": "mac-learning"
},