DFW rule(s) not working as expected after VM Hardware change or vMotion
search cancel

DFW rule(s) not working as expected after VM Hardware change or vMotion

book

Article ID: 326334

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • DFW rule(s) not working as expected after live VM Hardware change (vCPU hot add for example) or vMotion.
 
  • ESXi host logs (vmkernel.log) display message(s) similar to:
2019-11-06T15:15:57.970Z cpu46:66391)Importing nic-12345-eth0-vmware-sfw.2, Version 500
2019-11-06T15:15:57.972Z cpu46:66391)Importing succeeded

Note: filter versions 5 and 6 and also impacted.
 
  • The DFW rule affected contains more than 7 listed ports as source or destination.
Example:
- Rule 1010 contains 13 listed ports is affected by the issue:
#vsipioctl getrules -f nic-12345-eth0-vmware-sfw.2 | grep "rule 1010"
 rule 1010 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013} accept;

- Rule 1011 contains only 3 listed ports although it includes many ports it is not affected by the issue:
#vsipioctl getrules -f nic-12345-eth0-vmware-sfw.2 | grep "rule 1031"
 rule 1011 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001-1013, 2001} accept;


Environment

VMware NSX-T Data Center

Cause

The issue start after a filter export/import operation is triggered. The export/import operations is triggered when a VM is vMotioned or live Hardware change are performed (vCPU hot add for example).
Prior to NSX-T 2.5, some DFW filter versions have an issue where the export/import operations only match the first 7 ports of the rule.

As an example:
Rule prior to export/import:
#vsipioctl getrules -f nic-12345-eth0-vmware-sfw.2 | grep "rule 1010"
rule 1010 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013} accept;

And after the export/import:
#vsipioctl getrules -f nic-12345-eth0-vmware-sfw.2 | grep "rule 1010"
rule 1010 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001, 1002, 1003, 1004, 1005, 1006, 1007} accept;

This issue occurs only when a filter export/import operation is triggered. When the filter is programmed from the NSX Manager, this issue will not occur.

Resolution

This issue is resolved in NSX-T 2.5.0.

Workaround:
As temporary workaround, change any DFW rules. This will cause the NSX Manager to push the DFW to all the ESXi hosts and the DFW filter will be programmed correctly.

As a long term workaround, if you don't want to upgrade the following options are available:

Option 1: redesign DFW rules
Split the DFW rules in multiple rules with a maximum of 7 source or destination ports.

Example:
Initial rule:
rule 1010 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013} accept;

After splitting the DFW Service in two DFW Services where the first service includes 7 ports and the second service with the remaining ports:
rule 1010 at 8 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1001, 1002, 1003, 1004, 1005, 1006, 1007} accept;
rule 1010 at 9 inout protocol tcp from addrset addrset-10 to addrset addrset-20 port {1008, 1009, 1010, 1011, 1012, 1013} accept;


Option 2: upgrade DFW filter version (only for NSX-T 2.4.x)
The issue described in this article is present in the DFW filter versions 5, 6 and 500. The issue is not present with the filter version 1000 introduced in NSX-T 2.4.x. 
In an NSX-T 2.4.x environment that has been upgraded from NSX-T 2.3.x, VMs may still use the affected filter versions.
In order for the VMs to move to filter version 1000 the following options are available:

1. Change the DFW filter version using the following commands:
    a. Find the DFW filter for the VM:
    #summarize-dvfilter
    Example:
    world 611217 vmm0:MY_TEST_VM vcUuid:'50 08 ef 5b 94 e7 89 96-89 35 3f 66 94 66 7f 28'
     port 50331664 UPSA - VM C.eth0
 
    vNic slot 2
 
    name: nic-12345-eth0-vmware-sfw.2 <<-- this is the DFW filter
(output omitted)


    b. Change the DFW filter version to 1000 
    #vsipioctl setexportversion -f {filtername} -e 1000

2. Power off/Power on VMs using the filter pre version 1000.

3. Add and remove VM using the filter pre version 1000 from the DFW exclusion list.
In the NSX-T UI go to Advanced Network & Security > Security > Distributed Firewall > Exclusion List
Note: this workaround is disruptive as flows may be dropped when the VM is added back to the DFW exclusion list.

4. Disable and re-enabled DFW globally. 
In the NSX-T UI go to Advanced Network & Security > Security > Distributed Firewall > Settings
Note: this workaround is disruptive as flows may be dropped when DFW is re-enabled.


To verify the DFW filter version follow the steps below:
1. Find the DFW filter for the VM:
#summarize-dvfilter

Example:
world 611217 vmm0:MY_TEST_VM vcUuid:'50 08 ef 5b 94 e7 89 96-89 35 3f 66 94 66 7f 28'
 port 50331664 UPSA - VM C.eth0
  vNic slot 2
  name: nic-12345-eth0-vmware-sfw.2 <<-- this is the DFW filter
 agentName: vmware-sfw
   state: IOChain Attached
   vmState: Attached
(output omitted)


2. Run the following vsipioctl command
#vsipioctl getfilterstat -f {filter-sfw.2}

Example:
#vsipioctl getfilterstat -f nic-12345-eth0-vmware-sfw.2
(output omitted)
FILTER INFO
-----------
sessions:       15078
flags:          0xe40
states:         322
rules:          14
table count:    14
filter version: 500  <---------- Filter version
ruleset gen:    24
hash:           23171