Customers may observe PSOD on ESXi hosts under specific conditions involving the dvfilter component.
The issue is caused when the dvfilter state restoration process is invoked twice due to a race condition. This occurs during port enablement when packets arrive before the initial state restoration completes, leading to duplicate restoration calls and eventually triggering a PSOD.
VMware ESXi 8.0.x
VMware NSX
A race condition in dvfilter is the root cause.
When a port is enabled, dvfilter initiates state restoration via an asynchronous helper task.
If packets are received before the first restore completes, a second restore request is queued.
This results in duplicate state restoration calls, leading to heap corruption and a PSOD.
The issue is already fixed in the ESXi 9.1 branch.
Log Excerpts:
Backtrace at PSOD (PSOD backtrace with sensitive bits masked (pointers/addresses and exact file paths/line numbers), while keeping the call flow intact)
(gdb) where#0 DLM_free (msp=0x************, mem=<optimized out>, allowTrim=1) at …/dlmalloc.c:****#1 0x**************** in Heap_Free (heap=0x************, mem=<optimized out>) at …/heap.c:****#2 0x**************** in vmk_HeapFree (heap=<optimized out>, mem=<optimized out>) at …/vmkapi_heap.c:****#3 0x**************** in VSIPFreeFromHeapWithoutAccounting (heapID=<optimized out>, data=<optimized out>) at …/vsip_util.c:****#4 0x**************** in uma_zfree_arg (z=0x************, item=item@entry=0x************, arg=arg@entry=0x0) at …/glue.c:****#5 0x**************** in uma_zfree (item=0x************, zone=<optimized out>) at …/uma.h:****#6 pfr_destroy_ktable (kif=kif@entry=0x************, kt=kt@entry=0x************, flushflags=7, set=PFR_SET_ACTIVE) at …/pf_table.c:****#7 0x**************** in pfr_setflags_ktable (…) at …/pf_table.c:****#8 0x**************** in pfr_detach_table (…) at …/pf_table.c:****#9 0x**************** in pf_tbladdr_remove (…) at …/pf.c:****#10 0x**************** in pf_rm_rule (…) at …/pf_ioctl.c:****#11 0x**************** in pf_commit_rules (…) at …/pf_ioctl.c:****#12 0x**************** in pfioctl (…) at …/pf_ioctl.c:****#13 0x**************** in VSIPCommitTransaction (…) at …/msg2pf.c:****#14 0x**************** in PFImportSingleRulesetTLV (…) at …/migrate.c:****#15 0x**************** in PFImportRulesTLV (…) at …/migrate.c:****#16 0x**************** in PFImportStateTLV (…) at …/migrate.c:****#17 0x**************** in PFImportState (…) at …/migrate.c:****#18 0x**************** in VSIPDVFRestoreState (…) at …/vsip_dvfilter.c:****
On the Host's vmkernel logs:
Filter Creation
Creating filter, expect restoreFilter nic-2786293-eth0-vmware-sfw.2 createdRegistered filter nic-2786293-eth0-vmware-sfw.2
First State restoration (Successful)
Restore state called for filter nic-2786293-eth0-vmware-sfw.2Importing nic-2786293-eth0-vmware-sfw.2Importing succeededFilter creation report: source = Import
Second State Restoration for the same filter causes a PSOD
Restore state called for filter nic-2786293-eth0-vmware-sfw.2Importing nic-2786293-eth0-vmware-sfw.2Unconfigured filter nic-2786293-eth0-vmware-sfw.2--- PSOD Triggered ---