Intermittent connectivity for VMs on the same Geneve segment, different ESXi hosts when there's an L2 Bridge configured
search cancel

Intermittent connectivity for VMs on the same Geneve segment, different ESXi hosts when there's an L2 Bridge configured

book

Article ID: 317789

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  •  When testing with ICMP, the connectivity between two vms connected to the same Geneve segment, works for a few minutes and fails  and then it recovers without intervention and fails again. 
  • This issue has been noticed in environments where,L2 bridge configurations that use "promiscuous mode". This is one of the 3 options of advanced settings required as part of the L2 bridge configuration:

  • Documentation -- > https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/administration/GUID-F133B293-5DEA-4DC8-99DB-6EF004C8D8D7.html


    -The "get logical switch <UUID> mac-table" shows that when the problem happens, the MAC address of the destination VM is linked to the TEP of the Edge Transport node hosting the active instance of the L2 Bridge, and not the TEP of the Esxi host where the destination vm resides for example:

- On The Host

nsxcli -c 'get logical-switch 697cd139-####-####-####-##########2d mac-table' | grep "83:e1:2e\|Entry"

                             Host Kernel Entry
 00:50:56:##:##:2e 00:50:56:##:##:4e 10.220.136.11 0xb ==> Edge TEP
                             LCP Remote Entry
 00:50:56:##:##:2e 00:50:56:##:##:4e 10.224.135.12

- The output of the command "net-vdl2 -n <logical segment VNI number>-M mac -s <name of NSX related switch>" also shows the incorrect MAC table entry pointing to the edge TEP while the issue happens:

 Inner MAC: 00:50:56:##:##:2e
        Outer MAC: 00:50:56:##:##:4e
        Outer IP: 10.220.136.11
        Flags: (V,U,A)

- While the ping fails, both the Edge and NSX-T manager show the MAC table entry for the Dst MAC (00:50:56:##:e##1:2e for this example):

- On The Edge

nsxcli -c 'get logical-switch 697cd139-####-####-####-##########2d mac-address-table'

 MAC : 00:50:56:##:##:2e
        Tunnel : 9c0192fc-####-####-####-##########09
        IFUID : 448
        LOCAL : 10.220.136.11
        REMOTE : 10.224.135.12
        ENCAP : GENEVE
        SOURCE : Static


- On the NSX-T manager:

root@CYSNSXM02:~# nsxcli -c 'get logical-switch 697cd139-####-####-####-##########2d mac-table'
VNI MAC VTEP-IP TransportNode-ID
71691 00:50:56:##:##:2e 10.224.135.12 c14cff84-####-####-####-##########e8




- Reverse path filter has been configured on the ESXi host where the edge is running

esxcli system settings advanced list -o /Net/ReversePathFwdCheckPromisc
   Path: /Net/ReversePathFwdCheckPromisc
   Type: integer
   Int Value: 1
   Default Int Value: 0
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:

   Description: Block duplicate packet in a teamed environment when the virtual switch is set to Promiscuous mode.



Environment

The issue has been observed in NSX-T versions 3.0.1 and 3.1.1.

Cause

While enabling NSX-T on VDS, a sync between the old DVS and the new N-VDS of the ESX advanced config options like ReversePathFwdCheckPromisc is completed. However, if the sync is not applied correctly after reboot of the ESXi host, it may result in this undesirable behavior.
 
This can also happen when an ESXi host reboots in between the time when NSX was enabled on DVS, and when a setting like the ESX advanced config ReversePathFwdCheckPromisc is manually set to 1.

Resolution


This has been resolved with the release of NSX-T 3.1.2 or later.
 
Even though the reversepathfwdcheck filter has been applied, the issue may still be present due to the reason mentioned in the cause section and hence we need apply the following workaround 

Workaround :

On the ESXi host with the EDGE Bridge VM, hosting the active instance of the bridge, run the following :

  1. Implement the ReversePathFwdCheckPromisc  --> esxcli system settings advanced set -o /Net/ReversePathFwdCheckPromisc -i 1

  2. Followed by -- > nsxdp-cli vswitch runtime set ReversePathFwdCheckPromisc 1

  3. Disable and re-enable the promiscuous setting on the DVPG to which the bridge vNIC is connected. Keep in mind the traffic using the bridge is impacted while promiscuous mode is disabled. 

 

NOTE 1 : Please ensure that ReversePathFwdCheckPromisc is set on each ESXi hosts, which may host the EDGE Bridge VM. If the EDGE Bride VM moves to a host where ReversePathFwdCheckPromisc is not implemented, the traffic may be impacted as described above.  

NOTE 2 : In NSX-T version 3.0.1 the "nsxdp-cli" command will not be persistent across reboots. For an alternative solution that persists across reboots contact VMware Support.