DFW rules are incorrectly applied to NSX Edge ESGs and DLRs
search cancel

DFW rules are incorrectly applied to NSX Edge ESGs and DLRs

book

Article ID: 314295

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

Symptoms:

  • NSX Edge Services Gateway (ESG) VMs and/or Distributed Logical Router (DLR) appliances experience partial or 100% packet loss.
  • Traffic flows traversing NSX ESG experience reduced performance.
  • ESGs or DLRs impacted were recently created, upgraded or redeployed.
  • You see a 'slot-2' dvFilter applied to the ESG or DLR appliance even though you do not have any DFW rules configured to apply to them when running the summarize-dvfilter command from an ESXi command prompt.

    For example:

    world 4498886 vmm0:Testing-Edge-1-0 vcUuid:'50 1e 5b 00 2f 49 1c a0-xx xx xx xx xx xx'
    port 50331766 Testing-Edge-1-0.eth0
    vNic slot 2
    name: nic-4498886-eth0-vmware-sfw.2
    agentName: vmware-sfw
    state: IOChain Attached
    vmState: Detached
    failurePolicy: failClosed
    slowPathID: none
    filter source: Dynamic Filter Creation
    vNic slot 1
    name: nic-4498886-eth0-dvfilter-generic-vmware-swsec.1
    agentName: dvfilter-generic-vmware-swsec
    state: IOChain Attached
    vmState: Detached
    failurePolicy: failClosed
    slowPathID: none
    filter source: Alternate Opaque Channel

     
  • In the NSX Manager logs, you see entries similar to:

    2018-02-09 13:42:05.237 CST INFO VirtualMachineDvfilterMonitor-1 VirtualMachineWorkQueue:534 - Updating exclude list with vm-245204 (Testing-Edge-0)...
    2018-02-09 13:42:05.237 CST INFO VirtualMachineDvfilterMonitor-1 ExcludeListServiceImpl:240 - Get VnicIds for Exclude list Member Ids
    2018-02-09 13:42:05.237 CST INFO VirtualMachineDvfilterMonitor-1 FirewallInstallManagerImpl:443 - Getting firewall enabled clusterIds
    2018-02-09 13:42:05.238 CST INFO VirtualMachineDvfilterMonitor-1 FirewallInstallManagerImpl:452 - Firewall is enabled for the clusters : [domain-c71337, domain-c243225, domain-c71339, domain-c190120, domain-c80]
    2018-02-09 13:42:05.488 CST INFO VirtualMachineDvfilterMonitor-1 VirtualMachineWorkQueue:522 - Update DvFilter Settings failed. com.vmware.vshield.vsm.exceptions.ObjectNotFoundException: core-services:202:The requested object : domain-c243225 could not be found. Object identifiers are case sensitive.


    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX for vSphere 6.4.x
VMware NSX for vSphere 6.3.x

Cause

This issue occurs because by default, the NSX components DLR and ESGs are tracked by NSX Manager as system resources and instructs the ESXi hosts not to apply DFW filter on these VMs. The NSX Manager achieves this by publishing the list of VMs to be part of exclusion list to every NSX prepared cluster. An exception was observed in the vsm logs for one of the clusters to which the exclusion list need to be published. It happened because one of the cluster that exists in firewall_status_cluster table does not exist in domain_object_table or vCenter inventory. The workflow that can cause this database inconsistency is when the user deletes an NSX prepared cluster with no ESXi hosts from vCenter without un-preparing the cluster from NSX. After the above workflow, if a user deploys or re-deploys an NSX edge the VMs can get applied with DFW filters.

Note: This issue can effect any vm which has been added to the exclusion list, not just automated system object such as DLRs or ESGs.

Resolution

This issue is resolved in:

  • VMware NSX for vSphere 6.3.6.
  • VMware NSX for vSphere 6.4.1.


Workaround:
To work around this issue if you do not want to upgrade, run a 'Force Sync' of firewall services to vSphere Clusters with impacted ESGs/DLRs which removes the slot-2 dvFilter.

Note: This is a temporary solution, as the filter may be applied again should the appliances be re-deployed or upgraded. The slot-2 filter will also be applied to any newly created ESGs until the stale DB entry is removed.

To remove the stale cluster entry from the firewall_status_cluster table, contact Broadcom Support and quote this Knowledge Base article ID (314295) in the problem description. 

Additional Information

Impact/Risks:
Because the firewall rules were likely not designed with the intention of filtering flows in/out of DLR and ESG appliances, unpredictable filtering may be experienced. This can result in partial or complete packet loss depending on the rule-set. Also, because ESGs tend to funnel a large amount of north/south traffic, the addition of the slot-2 filter increases overhead and may cause performance impact.