Procedure to verify broadcast and unknown unicast flooding in NSX environments
search cancel

Procedure to verify broadcast and unknown unicast flooding in NSX environments

book

Article ID: 437804

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Potential network anomalies reported, specifically suspecting unknown unicast flooding on the virtual network path. The impact is observed as intermittent connectivity or performance degradation within the network infrastructure. This could happen with specific service insertion configurations.

Environment

  • VMware NSX less than 3.2.5.X
  • 3rd party solutions with Service Insertion

Cause

A flooding issue can happen due to NSX Central Control Plane (CCP) service insertion code. The customer have a way to check if flooding is happening or not answered for concerned MAC address from data path. CCP should also point to relevant logs, so that customer can check if fix done at CCP is working as expected from NSX manager.

Resolution

  • Map the logical switch VNI identifiers by executing the following command: nsxcli -c get logical-switches

  • Verify unknown unicast flooding counters at the host level. Replace <VNI> with the ID found in step 1 for the data path virtual interface, and replace <DVS> with the DVS name where the Data Path virtual NIC is attached: net-vdl2 -S -s <DVS>-1 -n <VNI>

  • Correlate VM MAC address behaviour on the segment using: net-vdl2 -M mac -s <DVS> -n <VNI>
    for example:

    Executing:: net-vdl2 -M mac -s <DVS> -n XXXXX 
    Legend: [V:Valid], [U:in Use],
    Legend: [S:Seen - learnt or extended during the last ageing period],
    Legend: [A:Aged - not updated in during the last ageing period],
    Legend: [R:Auto Refresh],
    Legend: [G:VTEP Group - learnt from VTEP Group]
    MAC Entry Count:        24
            Inner MAC:      00:50:56:YY:YY:YY
            Outer MAC:      ff:ff:ff:ff:ff:ff
            Outer IP:       255.255.255.255
            Flags:          (U,S,R)

    If Outer MAC value  is ff:ff:ff:ff:ff:ff or Fags does not have the V flag set then "unknown unicast flooding" would occur for mac: 00:50:56:YY:YY:YY.

  • This is not necessarily a sign of the flooding and may be normal: perform a packet capture on the VM data path virtual NIC to verify if broadcast ARP requests from the VM are failing to receive responses.
  • Review the mac.lookup.flood counter. If the counter remains stable and does not exhibit very rapid increase (several thousand per second) , the network layer is operating normally

  • The subsequent troubleshooting focus to the upstream physical gateway or the receiving end device to determine why ARP requests are not being acknowledged

  • Additionally, it should be checked that stale VIFs (Virtual InterFace) are not left in the NSX Manager Control Plane logs (/var/log/cloudnet/nsx-ccp.log and /var/log/syslog files )

    XX/YY/ZZ  INFO Owl-worker-0 ServicePathEventHandler 88250 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="service-insertion"] SiServicePathHop with vif UUU-UU-UU-UUU is not active, is_active_from_mp: true, is_active_from_ccp: true, is_active_from_dp: false
    XX/YY/ZZ  INFO Owl-worker-0 ServicePathEventHandler 88250 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="service-insertion"] SiServicePathHop with vif UUU-UU-UU-UUU is not active, is_active_from_mp: true, is_active_from_ccp: true, is_active_from_dp: false

    XX/YY/ZZ  INFO Owl-worker-3 ServicePathEventHandler 88250 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="service-insertion"] SiServicePathHop with vif UUU-UU-UU-UUU is not active, is_active_from_mp: true, is_active_from_ccp: true, is_active_from_dp: false
    XX/YY/ZZ  INFO Owl-worker-3 ServicePathEventHandler 88250 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="service-insertion"] SiServicePathHop with vif UUU-UU-UU-UUU is not active, is_active_from_mp: true, is_active_from_ccp: true, is_active_from_dp: false

Additional Information

Ensure the environment is operating on the recommended patch level (minimum version: NSX 3.2.5.X) to validate that the underlying infrastructure is functioning within normal parameters.