DFW FQDN filtering works at Layer 7 and only when a Layer 7 DNS Context Profile DFW rule in placed above the FQDN one.
Without these L7 DNS rules, FQDN-based rules will not be enforced properly.
In addition, legacy (non-Turbo Mode) IDPS is in use. IDPS also requires inspection at Layer 7.
During peak DNS traffic times, you notice that some DNS queries are timing out. In addition, you notice that some L7 error counters are increasing,
per the output from vsipioctl getfilterstats on the DNS server VM:
DROP REASON
-----------
short: 701
state-insert: 7
strict no syn: 29015082
L7 attr error: 310223336
match drop rule rx packets: 366913427
match drop rule tx packets: 83571016
state-mismatch: 1842
3wh error: 208
seqno outside window: 508
seqno old retrans: 412
seqno old ack: 400
seqno bad ack: 496
seqno gt maxack: 508
seqno lt minack: 412
MISCELLANEOUS
-------------
src-limit: 4932
pkts-frag-queued-v4: 604
L7 pending: 1662467517
Both L7 attr error and L7 pending counters increase at a rapid rate.
When these L7-based DNS rules are changed to L4-based rules, the packet loss ceases.
VMware NSX 4.x
vDefend Firewall
FQDN filtering
DNS L7 rules
IDPS (legacy)
L4 DFW rules are implemented in a kernel module called VSIP which cannot handle complex operations required by L7 rules. Thus, VSIP needs to send any traffic required L7 introspection to a Userspace process, which operates at ~1-2Gbps per host. This area is also known as "Slow Path". Layer 4 operations reside in the Fast Path area at speeds of ~9Gbps.
Inspecting DVFilter stats for the DNS server VM, we see non-zero counts for faulting_err in the DVFilter
[ESXi:~} vsish
/> cat /net/dvFilter/slowpaths/3/stats
dvFilter slow path agent stats {
world_id:49639333
kernel_rx:18678817612
kernel_tx:18678817605
user_rx:18678817609
user_tx:18678817605
faulting_err:1887311
injecting_err:0
deferredPktCnt:0
The faulting_err indicates congestion in the DVFilter. This can happen if the amount of traffic exceeds the capacity of the Slow Path.
1. Enable Turbo Mode in order to increase the throughput of traffic from ~1-2 Gbps (Slow Path area) to ~9 Gbps (Fast Path area), per host. Turbo Mode will utilize the kernel's Fast Path for all traffic that requires L7 inspection and improve overall performance.
The Turbo Mode feature is described further here:
https://techdocs.broadcom.com/us/en/vmware-security-load-balancing/vdefend/vdefend-atp/4-2/nsx-ids-ips-and-nsx-malware-prevention/ddpi-engine.html
2. Revisit how IDS/IPS rules are configured as this can also contribute to an unnecessary increase in traffic as well. Best practice dictates that only critical workloads should be match these rules.
The guidance documented here below provides recommended practices for deploying IDPS in a scalable and performant manner.
https://knowledge.broadcom.com/external/article/313654/nsx-advanced-firewall-idps-performance-t.html