Symptoms:
- NSX-T Data Center 3.2.0.x and 3.2.1.x
DFW Symptoms
- Distributed Firewall stops enforcing rules on all the vNICs after an edit in the policy, or after disabling/enabling DFW globally.
- No alert or error is generated when editing/publishing the rules.
- On ESXi, summarize-dvfilter shows a filter assigned to a VM but no rules or addrsets are displayed:
[root@esxi:~] summarize-dvfilter | grep <VM name> -A2
world 20533096 vmm0:<VM name> vcUuid:'50 06 bc 04 6a 40 65 47-08 cc 19 01 20 b1 c4 2c'
port 67108957 <VM Name>.eth0
vNic slot 2
name: nic-20533096-eth0-vmware-sfw.2 << Filter name
[root@esxi:~] vsipioctl getaddrsets -f nic-20533096-eth0-vmware-sfw.2
No address sets.
[root@esxi:~] vsipioctl getrules -f nic-20533096-eth0-vmware-sfw.2
No rules.
- Logging of matched rules stops. In ESXi /var/log/dfwpktlog.log shows no new entries since the problem started.
- In ESXi /var/log/nsx-syslog.log shows lines similar to the following:
2023-03-23T14:05:02.842Z cfgAgent[1970112]: NSX 1970112 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="AC5FA7C0" level="info"] Decoder: Received DFW RuleSectionMsg msg (Operation SET): 45c11eea-f71e-4591-a595-e7675a20902e, pri: 60000010 <--- UUID is the DFW Policy Set being updated.
...
2023-03-23T14:05:02.843Z cfgAgent[1970112]: NSX 1970112 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="AD01E700" level="info"] dfw: Update runtime status to nestdb (error, meta info): 1102, <-- This is returned in the UI under the Status "unknown".
2023-03-23T14:05:02.843Z cfgAgent[1970112]: NSX 1970112 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="AD01E700" level="error" errorCode="LCP01158"] dfw: build DfwCache failed: unknown issue.
...
2023-03-23T14:05:02.844Z cfgAgent[1970112]: NSX 1970112 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="AD01E700" level="error" errorCode="LCP01155"] dfw: Failed to process request
Gateway Firewall Symptoms
- Edge datapathd service crashes regularly
- Edge logs have entries similar to this example /var/log/syslog
2022-12-03T09:29:39.805Z edge04m datapath-systemd-helper 21799 - - 2022-12-03T09:29:39Z datapathd 21864 firewallcp [ERROR] DfwChannel: Failed to update dfw cache due to exception: too many TCP/UDP port: 16
2022-12-03T09:29:49.560Z edge04 NSX 22823 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.datapathd.1670059789.21864.0.9.gz
NSX UI Symptoms
- When looking at Security > Distributed Firewall and checking the Policies, a status of "Unknown" may be realized. When clicking on the "Unknown" status, the "Status on Transport Nodes" has status "Failed" and returns error: "[Error Code = '1102', Error Message = '', Affected Entities = '[]'.]"
- Looking at the service in the rule, we see that the port set has a count of 16.