Issue Symptoms
NSX-T 3.2.2
ESXi 7.0.3 23307199
When latency metric collection was enabled on vRNI, traceflow was enabled on ESX host where Edge nodes were deployed to enable datapath metrics collections from edge nodes. As traceflow was enabled, all multicast packets (OSPF control) were directed to flow on non-ENS path.
When a new service config (to disable latency metric collection) was created and applied from vRNI, a latency profile with traceflow disabled got applied on ESX hosts where Edge nodes were hosted. As a result, ENS started to process OSPF packets in fast path. By design and implementation, ENS has a limitation of having only 8 destination ports for each multicast flow resulting in OSPF down for the Edge nodes with its uplink neighbors.
In general this issue could be seen with all multicast traffic where the multicast flows get impacted when the limit of 8 destination ports is reached.
This issue is resolved in NSX 3.2.5 , 4.1.1 and 4.2.X
Initial checks on ESX host
> esxcfg-nics -e
From the output, if the column "ENS capable" & "ENS Driven" is set to "True" for the physical vmnicsX then that ESXI host is enabled with ENS
> net-dvs -l | grep -i fc.mcast
From the output check if "com.vmware.net.portset.fc.mcast.enabled is set to "true" then multicast flow in enabled in datapath
> cat /etc/vmware/nsx/nsx-cfgAgent.xml
From the output check if "mcastEnabled" is set to "True" as shown below
<features>
<flowCache>
<enabled>true</enabled>
<mcastEnabled>true</mcastEnabled>
</flowCache>
</features>
> nsxdp-cli ens flow-table dump
Flows marked with MC will show that Multicast is enabled and in issue state, all of multicast flows are marked as invalidated.