VM Traffic is Intermittently Dropped by DFW for 5 Seconds
search cancel

VM Traffic is Intermittently Dropped by DFW for 5 Seconds

book

Article ID: 389594

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:

  • VM's connected to DVPGs (Distributed Virtual Portgroups) are impacted
  • VM's on ESXi clusters that had "NSX on DVPG" feature enabled at some point, then disabled are impacted
  • VM traffic is intermittently interrupted for 5 seconds at a time, typically around the time of a DFW rule publish
  • The dfwpktlog.log on the ESXi host shows matches on DFW rules for 5 seconds, despite the VM being connected to a DVPG (it should not have rules):

      • 2025-02-11T21:19:50.150Z No(13) FIREWALL-PKTLOG[2104641]: XXXXXXXX INET match REJECT 2 OUT 1348 TCP xxx.xxx.xxx.xxx/18374->xxx.xxx.xxx.xxx/30041 PA DFW Default Deny All
        2025-02-11T21:19:50.151Z No(13) FIREWALL-PKTLOG[2104641]: XXXXXXXX INET match REJECT 2 OUT 60 TCP xxx.xxx.xxx.xxx/54066->xxx.xxx.xxx.xxx/30013 S DFW Default Deny All
        2025-02-11T21:19:50.161Z No(13) FIREWALL-PKTLOG[2104641]: XXXXXXXX INET match REJECT 2 OUT 174 TCP xxx.xxx.xxx.xxx/40404->xxx.xxx.xxx.xxx/6514 PA DFW Default Deny All
        2025-02-11T21:19:50.178Z No(13) FIREWALL-PKTLOG[2104641]: XXXXXXXX INET match REJECT 2 OUT 396 TCP xxx.xxx.xxx.xxx/63452->xxx.xxx.xxx.xxx/30041 PA DFW Default Deny All

        *IP's above are obscured for privacy

  • During the 5 second window of the issue, DFW rules can be seen applied to the VM. To check this:

    • Log into the ESXi host as root user
    • Issue the command:
      • summarize-dvfilter | grep -A 3 <VM Name>

        Example:
        [root@esxcomp-2a:~] summarize-dvfilter | grep -A 3 vmm
        world 1371516 vmm0:Test_VM vcUuid:'50 20 92 e1 11 b7 10 d3-56 c5 e0 da 46 87 b5 d2'
         port 67108881 Test_VM.eth0
          vNic slot 2
           name: nic-1371516-eth0-vmware-sfw.2    <------

    • Use the slot 2 filter name (ending in sfw.2) in the following command:
      • watch vsipioctl getrules -f <slot 2 filter name>


    • The above command automatically executes the 'vsipioctl getrules' every 2 seconds making it easier to observe changes to the output. 
    • Then, while the above command is running, make any change to the DFW configuration and publish. The simplest way of doing this is to enable logging on a rule, then publishing. 
    • Following the publish, you will see from the 'vsipioctl getrules' output that rules are applied for about 5 seconds.
       
      • Example:

        Before publish:

        vsipioctl getrules -f nic-2428322-eth0-vmware-sfw.2
        No rules.


        Within 5 seconds following publish (rules are observed):

        vsipioctl getrules -f nic-2428322-eth0-vmware-sfw.2

        ruleset mainrs {
        # generation number: 0
        # realization time : 2025-02-18T23:35:04
        # PRE_FILTER rules
        rule 5 at 1 inout protocol any from malicious to any drop tag 'MALICIOUS IP AT SOURCE RULE';
        rule 6 at 2 inout protocol any from any to malicious drop tag 'MALICIOUS IP AT DESTINATION RULE';
        # FILTER (APP Category) rules
        rule 1003 at 1 inout protocol any from any to any accept;
        rule 3 at 2 inout inet6 protocol ipv6-icmp icmptype 136 from any to any accept;
        rule 3 at 3 inout inet6 protocol ipv6-icmp icmptype 135 from any to any accept;
        rule 4 at 4 inout protocol udp from any to any port {67, 68} accept;
        rule 2 at 5 inout protocol any from any to any accept with log;
        # IDP rules
        rule 1009 at 1 inout protocol any from any to any with ids profile xxxxx-xxxxx-xxxxx-xxxxx-xxxxx idp_detect oversubscription inherit;
        }

        ruleset mainrs_L2 {
        # generation number: 0
        # realization time : 2025-02-18T23:35:04
        # FILTER rules
        rule 1 at 1 inout ethertype any stateless from any to any accept;
        }



        Beyond 5 seconds after publish:


        vsipioctl getrules -f nic-2428322-eth0-vmware-sfw.2
        No rules.

 

 

Relevant ESXi logs:

 

vmkernel.log:
2025-02-20T00:04:08.118Z In(182) vmkernel: cpu7:270263)configured filter nic-267167-eth2-vmware-sfw.2
2025-02-20T00:04:08.118Z In(182) vmkernel: cpu7:270263)filter nic-267167-eth2-vmware-sfw.2 flushing flow cache
2025-02-20T00:04:16.755Z In(182) vmkernel: cpu7:270263)unconfigured filter nic-267167-eth2-vmware-sfw.2

nsx-syslog:
2025-02-20T01:07:47.334Z In(182) cfgAgent[270232]: NSX 270232 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6312A700" level="info"] dfw: Kernel filter nic-267167-eth2-vmware-sfw.2 has lost its vif id vif-3
2025-02-20T01:07:47.335Z In(182) cfgAgent[270232]: NSX 270232 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="635B3700" level="info"] dfw: Cleanup DFW config to filter nic-267167-eth2-vmware-sfw.2
2025-02-20T01:07:57.335Z In(182) cfgAgent[270232]: NSX 270232 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6312A700" level="info"] dfw: Kernel filter nic-267167-eth2-vmware-sfw.2 has lost its vif id vif-3
2025-02-20T01:07:57.337Z In(182) cfgAgent[270232]: NSX 270232 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="635B3700" level="info"] dfw: Cleanup DFW config to filter nic-267167-eth2-vmware-sfw.2

 

VIFs (Virtual Interface) labeled as vif-x are invalid and should not have rules applied. 

Environment

VMware NSX 4.2.0 - 4.2.1

Cause

After enabling and disabling "DFW on DVPG" feature (formerly known as Security-Only prior to 4.2.x), the DFW will apply rules to VM's without a VIF (virtual interface). VM's without VIFs should NOT have rules applied

Resolution

Workaround:

  • For immediate relief, add a temporary allow rule to the top of the DFW rule list for the impacted workloads.
  • Then, restart nsx-cfgagent service on all ESXi hosts within impacted clusters:
    • SSH into the ESXi host as root user
    • Issue the following command
      • /etc/init.d/nsx-cfgagent restart
  • The issue should be permanently resolved after restarting nsx-cfgagent
  • You can then proceed to remove the temporary allow rule

 

Permanent Fix:

A code fix will be included in a future release.