Some DNS queries time-out whenever DNS L7 Context Profile rules are in use
search cancel

Some DNS queries time-out whenever DNS L7 Context Profile rules are in use

book

Article ID: 424724

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall

Issue/Introduction

DFW FQDN filtering works at Layer 7 and only when a Layer 7 DNS Context Profile DFW rule in placed above the FQDN one. 
Without these L7 DNS rules, FQDN-based rules will not be enforced properly.

In addition, legacy (non-Turbo Mode) IDPS is in use.  IDPS also requires inspection at Layer 7. 

During peak DNS traffic times, you notice that some DNS queries are timing out. In addition, you notice that some L7 error counters are increasing,
per the output from vsipioctl getfilterstats on the DNS server VM:

DROP REASON
-----------
short:                701
state-insert:         7
strict no syn:        29015082
L7 attr error:        310223336
match drop rule rx packets: 366913427
match drop rule tx packets: 83571016
state-mismatch:       1842
  3wh error:            208
  seqno outside window: 508
  seqno old retrans:    412
  seqno old ack:        400
  seqno bad ack:        496
  seqno gt maxack:      508
  seqno lt minack:      412

MISCELLANEOUS
-------------
src-limit:            4932
pkts-frag-queued-v4:  604
L7 pending:           1662467517

 

Both L7 attr error and L7 pending counters increase at a rapid rate.
When these L7-based DNS rules are changed to L4-based rules, the packet loss ceases.

Environment

VMware NSX 4.x
vDefend Firewall
FQDN filtering 
DNS L7 rules
IDPS (legacy)

Cause

L4 DFW rules are implemented in a kernel module called VSIP which cannot handle complex operations required by L7 rules. Thus, VSIP needs to send any traffic required L7 introspection to a Userspace process, which operates at ~1-2Gbps per host.   This area is also known as "Slow Path".  Layer 4 operations reside in the Fast Path area at speeds of ~9Gbps.

Inspecting DVFilter stats for the DNS server VM, we see non-zero counts for faulting_err in the DVFilter

[ESXi:~} vsish
 
/> cat /net/dvFilter/slowpaths/3/stats
dvFilter slow path agent stats {
   world_id:49639333
   kernel_rx:18678817612
   kernel_tx:18678817605
   user_rx:18678817609
   user_tx:18678817605
   faulting_err:1887311 
   injecting_err:0
   deferredPktCnt:0

 

The faulting_err indicates congestion in the DVFilter.   This can happen if the amount of traffic exceeds the capacity of the Slow Path.

Resolution

1.  Enable Turbo Mode in order to increase the throughput of traffic from ~1-2 Gbps (Slow Path area) to ~9 Gbps (Fast Path area), per host.  Turbo Mode will utilize the kernel's Fast Path for all traffic that requires L7 inspection and improve overall performance. 

The Turbo Mode feature is described further here:
https://techdocs.broadcom.com/us/en/vmware-security-load-balancing/vdefend/vdefend-atp/4-2/nsx-ids-ips-and-nsx-malware-prevention/ddpi-engine.html

2. Revisit how IDS/IPS rules are configured as this can also contribute to an unnecessary increase in traffic as well.  Best practice dictates that only critical workloads should be match these rules.  
The guidance documented here below provides recommended practices for deploying IDPS in a scalable and performant manner.
https://knowledge.broadcom.com/external/article/313654/nsx-advanced-firewall-idps-performance-t.html