In NSX-T environments, ESXi hosts may experience a PSOD during or after an upgrade to NSX 4.2.x when the number of DFW rules on the host exceeds the supported configuration limits.
VMware NSX-T Data Center 3.2.x and 4.2.x
VMware vSphere ESXi 7.x and later
vDefend Firewall (DFW) enabled.
In NSX 3.2.x, the DFW rules and address sets/groups were stored in a shared 3 GB heap memory. This allowed higher rule counts to be realized, although unsupported.
Starting in NSX 4.2.1 and later, heap memory allocation was redesigned:
3 GB heap dedicated for address sets/groups (vsip-kentries)
1 GB heap allocated exclusively for DFW rules (vsip-rules)
When DFW rule count per ESXi host exceeds the supported maximum (~120K per host, ≤4K per vNIC), the vsip-rules heap exhausts, leading to memory allocation failures.
Example snippet of the vsip-rules heap and its utilization from the support bundle.
Note : The following command can be used to review firewall thresholds on a live host:
ESXi:~] nsxcli
ESXi> get firewall thresholds
This condition can leave firewall rules in an inconsistent state and may cause an ESXi host PSOD.
Symptoms:
ESXi host crashes (PSOD) during or after NSX upgrade.
(gdb) bt
#0 fp2_rulematch_set (rs=0x************, rs_num=<optimized out>, rule=<optimized out>, flags=3,
fprl=0x************, kif=0x************, str="wildcard") at …/pf_policy_lookup.c:****
#1 pf_sort_wildcard_rules (pd=0x************, rs_num=1, max_nr=0x************, rlist=0x************,
rs=0x************, kif=0x************) at …/pf_policy_lookup.c:****
#2 pfp_policy_lookup (kif=0x************, policy_lookup_ctrl=0x************, ruleset=0x************,
pd=0x************, sport=<optimized out>, dport=<optimized out>, direction=2, ac=0x0,
curr_attr_state=0x************, tm=0x************) at …/pf_policy_lookup.c:****
#3 0x**************** in pf_test_tcp (rm=0x************, jump_rm=0x************, ids_rm=0x************,
sm=0x************, prlists=<optimized out>, direction=2, kif=0x************, m=0x************,
off=20, h=0x************, rlookup=1, rule_type=0, curr_attr_state=0x************,
next_attr_state=0x************, ac=0x0, sip_persist=0x0, lb_ctx=0x0, reason=0x************,
pd=0x************, ethtype=8, am=0x************, rsm=0x************, ifq=0x0, inp=0x0)
at …/pf.c:****
#4 0x**************** in pf_validate_state_v2 (kif=0x************, state=0x************, rule=0x************,
jump_rule=0x************, ids_rule=0x************, anchor_rule=0x************, orig_pd=0x************,
ethtype=8, paction=0x************, rule_type=0, next_attr_state=0x************, waslocked=0)
at …/pf.c:****
#5 0x**************** in pf_validate_session_v2 (kif=0x************, m=<optimized out>, state=0x************,
pd=0x************, ethtype=<optimized out>, direction=<optimized out>, waslocked=0) at …/pf.c:****
#6 0x**************** in pf_validate_session (direction=2, ethtype=8, pd=0x************, state=<optimized out>,
m=0x************, kif=0x************) at …/pf.c:****
#7 pf_test_state_tcp (state=0x************, direction=2, kif=0x************, m=0x************, off=20,
h=0x************, pd=0x************, ethtype=8, reason=0x************, check_only=0,
check_dnat_out=0, drop_rst=0x************) at …/pf.c:****
#8 0x**************** in pf_test (dir=2, ifp=0x************, m0=0x************, eh=0x************,
ethHdrLen=14, ethtype=8, inp=0x0, metadata=0x************, check_only=0, flow_entry=0x************)
at …/pf.c:****
#9 0x**************** in PFFilterPacket (cookie=0x************, fragsList=0x************,
dvDir=VMK_DVFILTER_TO_SWITCH, source=<optimized out>, verdict=0x************,
checkStateOnly=<optimized out>, flowMetaData=0x************) at …/glue.c:****
#10 0x**************** in VSIPFWProcessPackets (solution=0x************, filter=0x************,
pktList=0x************, direction=VMK_DVFILTER_TO_SWITCH, source=VSIP_DVFILTER_SOURCE_REGULAR,
action=0x************, checkStateOnly=0, flowMetaData=0x************) at …/vsip_fw.c:****
#11 0x**************** in VSIPDVFProcessPacketsInt (filterImpl=0x************, pktList=<optimized out>,
direction=<optimized out>, ensData=<optimized out>) at …/vsip_dvfilter.c:****
#12 0x**************** in ?? ()
#13 0x0000000000000000 in ?? ()
High DFW rule counts observed per host (400K–700K).
vmkernel logs report memory allocation failures during rule commit, for example:
Rule realization failures when publishing firewall policies.
Workaround:
Reduce the effective rule count per ESXi host to within supported configuration maximums:
≤120,000 rules per host
≤4,000 rules per vNIC
Recommended approaches:
Add non-essential or container-based VMs to the DFW exclusion list.
Move “ANY ANY ALLOW” type rules to the top if appropriate to avoid unnecessary rule expansion.
Audit and remove redundant, duplicate, or irrelevant firewall rules.
Resolution:
This is not a software defect; the behavior is by design starting with NSX 4.2.1 due to dedicated heap allocations.
Ensure that firewall rule design follows supported configuration maximums.
Best practices:
Use the “Applied To” field with security groups instead of applying rules globally at the “DFW” level. This prevents unnecessary rule replication across all vNICs.
Periodically audit firewall rules to eliminate redundancy.
For environments with large-scale container workloads, create dedicated groups for container VMs and apply rules selectively.