Symptoms:
NSX-T version 3.2.X and above
NSX-T version 4.x and above
This occurs when Layer 7 (L7) Distributed Firewall (DFW) rules process a high volume of traffic, causing the vsip-attr heap usage to exceed thresholds.
Example:
An environment with L7 DNS rules applied to heavily trafficked Domain Controller VMs may experience this issue. The large number of DNS queries inspected by the L7 rule can cause vsip-attr heap usage to exceed the 75% threshold.
Workaround
Permanent Fix
Find the affected VM and Slot 2 nic. We will look into the "L7 attr error" further to align the issue. You will need the information below to find the affected VM/s and may have to run against multiple VM for a full comparison.
SSH into the affected ESXi host with the alarm.
Run the below to check the heap size and match to the UI alarm.
[root@esxi-customer:~] nsxcli -c get firewall thresholds
Firewall Threshold Monitors
-------------------------------------------------------------------------------------------
# Name Raised Threshold CurrValue CurrSize MaxSize PeakEver EverTime(ago)
1 cfgagent False 100 5 139 MB 2500 MB 5 11d 12:41:18
2 dfw-cpu False 90 0 -- -- 52 13:48:26
3 dfw-session False 80 5 -- -- 23 4d 01:27:27
4 nsx-exporter False 100 19 149 MB 768 MB 22 2d 05:40:06
5 nsx-idps False 100 14 302 MB 2048 MB 15 11d 21:39:52
6 vdpi False 100 47 485 MB 1024 MB 47 4d 13:37:44
7 vsip-attr False 90 89 1144 MB 1280 MB 93 6d 08:59:47vsip <-------HERE
8 vsip-flow False 90 0 2 MB 768 MB 0 --:--:--
9 vsip-fprules False 90 0 6 MB 2560 MB 0 --:--:--
10 vsip-fqdn False 90 2 12 MB 512 MB 3 2d 23:12:18
11 vsip-ipreputation False 90 0 0 MB 256 MB 0 --:--:--
12 vsip-module False 90 21 539 MB 2560 MB 24 11d 13:26:58
13 vsip-rules False 90 0 5 MB 3070 MB 0 --:--:--
14 vsip-si False 90 0 0 MB 128 MB 0 --:--:--
15 vsip-state False 90 0 0 MB 512 MB 0 --:--:--
This ESXi host currently has the alarm in NSX-T UI. vsip-attr heap under Threshold is currently at 89%.
--------------------------------------------------------
Next run command = summarize-dvfilter | grep -i "VMname" -A9
Use Output = name: nic-2580247-eth0-vmware-sfw.2
Use the nic-number-eth0-vmware-sfw.2 to get the reset of the criteria.
run = vsipioctl getfilterstat -f nic-6090470-eth0-vmware-sfw.2 | grep -E "DROP REASON | L7 attr error"
DROP REASON
L7 attr error: 423066 <---------HERE
This error would increase during the issue and keep incrementing. Vmotion of VM might cause a reset of some criteria. Run this command again later to how much attr error increases.
run = vsipioctl getfilterstat -f nic-6090470-eth0-vmware-sfw.2 | grep -E "DROP REASON | L7 attr error"
DROP REASON
L7 attr error: 425066 <--------HERE
We can see the Drop Reason value increase.
-----------------------------------------------------
Here we can see a large number of active flows for UDP. The other flows are useful to understand but, in this example we will focus on UDP.
Next, run the following commands to get flows.
run = vsipioctl getfloodstat -f nic-2109705-eth0-vmware-sfw.2
UDP:
Flood Protection : disabled
Active UDP Flows = 26063 <------HERE
ICMP:
Flood Protection : disabled
Active ICMP Flows = 20
OTHER:
Flood Protection : disabled
Active OTHER Flows = 0
TCP Half-open:
Flood Protection : disabled
Active TCP Flows = 51
This DC VM was the cause of the heap threshold being over.
-
At this time you can proceed with any of the options listed in the Workaround section.