DFW Memory Usage Very High Due to `vsip-kentries` heap consuming excessive memory
search cancel

DFW Memory Usage Very High Due to `vsip-kentries` heap consuming excessive memory

book

Article ID: 404061

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Customers also see vMotion fails with DFW vMotion Failure Alarm - "The DFW vMotion for DFW filter nic-##########-eth0-vmware-sfw.2 on destination host **** has failed and the port for the entity has been disconnected."  as well as observing repeated alerts indicating high memory usage in the vsip-kentries heap, particularly during or after large-scale vMotion operations. This impact is associated with a large number of address sets being marked as LOCAL instead of GLOBAL after vMotion and filter imports, resulting in memory bloat and failure to release heap post-import.

Environment

NSX 4.2.x

vDefend Firewall

Cause

When a host enters maintenance mode and multiple VMs are vMotioned off in bulk, each vNIC on these VMs triggers an import of its firewall filter. These filters contain address sets which, during import, are temporarily created as LOCAL addrsets. 

Normally, after import, the NSX control plane (via cfgAgent) sets the GLOBAL_TABLES flag on the kernel interface (kif), converting these LOCAL addrsets into GLOBAL addrsets—ensuring efficient memory reuse. In this case, due to the high rate of filter imports (e.g., ~100 filters each with ~100 addrsets), the GLOBAL_TABLES flag could not be set in time. Consequently, LOCAL addrsets remained in memory and caused the vsip-kentries heap to cross critical thresholds.

 

Sample Logs:

  • In /var/run/log/nsx-syslog you may see a dfw_vmotion_failure message.
2026-01-06T07:45:56.273Z Er(179) cfgAgent[2107114]: NSX 2107114 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" s2comp="nsx-monitoring" entId="########-####-####-####-###########" tid="5D785700" level="fatal" eventState="On" eventFeatureName="distributed_firewall" eventSev="critical" eventType="dfw_vmotion_failure"] The DFW vMotion for DFW filter nic-#########-eth0-vmware-sfw.2 on destination host <host.fqdn> has failed and the port for the entity has been disconnected.

 

  • In /var/run/log/vmkernel.log you may see:
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)filter nic-#########-eth0-vmware-sfw.2 flushing flow cache
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)pfr_attach_table: nic-#########-eth0-vmware-sfw.2: ERROR ***************** local root table ########-####-####-####-########### not found
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)pfr_attach_table: nic-#########-eth0-vmware-sfw.2: ERROR ***************** local root table ########-####-####-####-########### not found
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)pfioctl: DIOCADDRULE failed with error 22
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)RAL: pfioctl: error, calling pf_rm_rule
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)RAL: pfioctl: error, back from pf_rm_rule
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)addrule ioctl failed: 22
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)Sending message to cfgAgent to raising alarm for filter import failure
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)Failed to restore datapath state : Failure
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)DVFilter: 1622: Couldn't find an installed filter: vNic 0, agent vmware-si
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)DVFilter: 1727: No unrestored state left, freeing pending state for world #########
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu53:#########)VMotion: 6709: 39732254719501875 D: Received all changed pages.
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)DVFilter: 2760: World: ######### - Sent Checkpoint Restore Done for ESXIO. Status: Success
2026-01-06T07:45:44.565Z In(182) vmkernel: cpu64:2098147)DVFilter: 1756: Bringing down port due to failed DVFilter state restoration and failPolicy of FAIL_CLOSED.

 

  • On the ESXi host run nsxcli -c get firewall thresholds to see the memory usage for vsip-kentries heap:
                                Firewall Threshold Monitors
-------------------------------------------------------------------------------------------
 #          Name          Raised  Threshold  CurrValue  CurrSize   MaxSize  PeakEver  EverTime(ago)
14     vsip-kentries       True       90         91      2813 MB   3070 MB     91       17:32:34      <= problematic value

 

  • On the ESXi host, run /bin/vsipioctl getmeminfo | grep pfrkentry
  • Note that inUse is significantly high and there are failures in allocating heap space.
/bin/vsipioctl getmeminfo | grep pfrkentry
zone 8:  pfrkentry maxObj = -1, objSize = 176, alloc = 702946228, free = 686395164, inUse = 16551064, numFail = 39099, totalMem = 2912987264

 

  • Filter import appears successful in vmkernel.log
2025-06-23T15:14:31.906Z In(182) vmkernel: cpu52:2098112)Importing succeeded
2025-06-23T15:14:31.906Z In(182) vmkernel: cpu52:2098112)Filter creation report: filter = nic-#########-eth0-vmware-sfw.2, source = Import

 

  • However, GLOBAL_TABLES flag is not set on the imported filter:
/bin/vsipioctl/getkifflags -f nic-#########-eth0-vmware-sfw.2
PF_KIF_FLAG_GLOBAL_TABLES                0   <<< 0 indicates Global Tables are NOT enabled.

 

  • LOCAL flag still present in addrset:
/bin/vsipioctl/getaddrsets -f nic-#########-eth0-vmware-sfw.2 -o
addrset 004148be-3e1f-4f15-b0d8-097a1f40a0e2 {


# generation number: 0
# realization time : 2025-04-29T08:20:55
# refs: 1, 0  flags: 0x10000025 (ROOT,LOCAL,PER,ACT,ANCREF)
}

 

  • Large number of imports observed in a short span:
2025-06-23T15:13:16.067Z In(182) vmkernel: cpu52:2098112)Importing nic-#########-eth5647-vmware-sfw.1, Version 1000
...
2025-06-23T15:13:18.671Z In(182) vmkernel: cpu74:2098112)Importing nic-#########-eth0-vmware-sfw.2, Version 1100

Resolution

A permanent fix is being targeted for a future NSX release..

 

Workarounds 

To mitigate the issue in existing environments:

  • Perform vMotion in smaller batches to avoid overwhelming the filter import process.

  • Use CIDR blocks or dynamic criteria in NSGroups rather than listing individual IP addresses.

  • Eliminate overlapping or excessively large dynamic groups to reduce addrset footprint.

  • Layer 2 rules aren't in use, modify global_macset_optimization_mode_enabled to be true, which will eliminate mac addresses from address sets and will free up heap space.
    • Get current settings via GET policy/api/v1/infra/settings/firewall/security
    • Modify global_macset_optimization_mode_enabled to be true instead of false.
    • Apply the setting via PATCH policy/api/v1/infra/settings/firewall/security

PR 3540135