ESXi hosts may experience a PSOD when NSX DFW L7 rules using FQDN-based context profiles are configured.
PSOD stack trace references VSIP / PF / FQDN-related code paths.
(gdb) bt
#0 memscan (c=0, no=<optimized out>, ni=<optimized out>, so=<optimized out>, si=0x************) at …/nsx_libc_memscan.h:****
#1 strlen (s=0x************) at …/nsx_libc_string.h:****
#2 pfa_get_counts (kif=0x************, nAttr=0x************, attrLen=0x************) at …/pf_attribute.c:****
#3 0x**************** in pfioctl (kif=0x************, dev=<optimized out>, cmd=<optimized out>, addr=<optimized out>, flags=<optimized out>, td=<optimized out>) at …/pf_ioctl.c:****
#4 0x**************** in VSIPConversionGetConnectionStat (kif=0x************, msg=0x************, msgLen=<optimized out>, result=0x************) at …/msg2pf.c:****
#5 0x**************** in VSIPToPFIoctl (cookie=<optimized out>, cmd=<optimized out>, data=<optimized out>, dataLen=<optimized out>, result=0x************) at …/msg2pf.c:****
#6 0x**************** in VSIPFlowGetRecordsForFilter (sol=<optimized out>, filter=0x************, data=0x************) at …/vsip_flow.c:****
#7 0x**************** in VSIPDVFConfigFilterByName (fpAgentName="vmware-sfw", filterName=<optimized out>, iter=<optimized out>, data=0x************) at …/vsip_dvfilter.c:****
#8 0x**************** in VSIPFlowGetRecords (fpAgentName="vmware-sfw", dvsPort=<optimized out>, cmdId=<optimized out>, buffer=<optimized out>, bufLen=<optimized out>, includeAttrs=1) at …/vsip_flow.c:****
#9 0x**************** in VSIPIoctlFlowData (cmd=<optimized out>, iocData=0x************, fnData=<optimized out>, result=<optimized out>) at …/vsip_fw_ioctl.c:****
#10 0x**************** in VSIPIoctlImpl (cmd=<optimized out>, req=0x************, result=0x************) at …/vsip_ioctl.c:****
#11 0x**************** in VSIPCharDevIoctl (cmd=<optimized out>, userData=<optimized out>, result=0x************) at …/vsip_dev.c:****
#12 0x**************** in VMKAPICharDevIoctl (handle=0x************, userData=<optimized out>, cmd=<optimized out>) at …/vmkapi_char.c:****
#13 0x**************** in VMKAPICharDevDevfsWrapIoctl (handle=0x************, cmd=<optimized out>, userData=<optimized out>, ioctlResult=0x************) at …/vmkapi_char.c:****
#14 0x**************** in CharDriverIoctl (deviceHandleID=<optimized out>, cmd=<optimized out>, dataInOut=0x************) at …/charDriver.c:****
#15 0x**************** in FDS_Ioctl (dataInOut=0x************, cmd=FDS_IOCTL_PASS_THRU, fdsHandle=<optimized out>) at …/fsDeviceSwitch.h:****
#16 0x**************** in DevFSIoctl (fileDesc=0x************, fhID=<optimized out>, cmd=IOCTLCMD_DEVFS_OPAQUE, dataIn=0x************, result=0x************) at …/devfs.c:****
#17 0x**************** in FSSVec_Ioctl (desc=<optimized out>, fhID=<optimized out>, cmd=<optimized out>, dataIn=<optimized out>, result=<optimized out>) at …/fsSwitchVec.c:****
#18 0x**************** in FSSObjectIoctlCommon (fhID=<optimized out>, file=0x************, cmd=<optimized out>, dataIn=0x************, result=0x************) at …/fsSwitch.c:****
#19 0x**************** in FSS_IoctlByFH (fileHandleID=<optimized out>, cmd=<optimized out>, dataIn=0x************, result=0x************, ioFlags=<optimized out>) at …/fsSwitch.c:****
#20 0x**************** in UserFile_PassthroughIoctl (vmfsObj=<optimized out>, cmd=<optimized out>, userData=<optimized out>, result=<optimized out>) at …/userFile.c:****
#21 0x**************** in UserVmfs_Ioctl (obj=<optimized out>, cmd=<optimized out>, userArg=<optimized out>, ioctlReturnCode=<optimized out>) at …/userVmfs.c:****
#22 0x**************** in LinuxFileDesc_Ioctl (fd=<optimized out>, cmd=<optimized out>, userData=<optimized out>) at …/linuxFileDesc.c:****
#23 0x**************** in User_LinuxSyscallHandler (fullFrame=0x************) at …/user.c:****
#24 0x**************** in gate_entry () at …/gates64.S:****
#25 0x**************** in ?? ()
Host recovers only after reboot.
VMware ESXi 8.0 U3
NSX versions prior to:
NSX 4.2.3.3
NSX 9.0.2
Distributed Firewall with L7 FQDN-based rules / context profiles
The vmkernel core dump confirms that the PSOD is caused by corruption of FQDN domain_list data structures within the NSX datapath.
The PSOD occurs while processing PF attribute connections associated with FQDN-based L7 DFW rules.
The attribute connection (ac) is marked with:
During DNS resolution, the domain_list linked list within the fqdn_node is incorrectly updated.
backtrace analysis shows:
Invalid tqh_first and tqh_last pointers in domain_list
Corrupted memory addresses being passed to strlen() and memscan()
Resulting in a kernel crash due to access of invalid memory
Example:
The issue can be triggered during any of the following operations on FQDN entries >> Read | Update | Delete
These operations typically occur during DNS resolution or FQDN re-evaluation.
Workaround
Use L4 Distributed Firewall rules instead of L7 FQDN-based rules.
Specifically:
Disable or remove L7 DFW rules that reference FQDN context profiles
Replace them with equivalent IP / CIDR / L4-based rules
This avoids the affected FQDN code path and prevents corruption of the domain_list structure.
Resolution
A permanent fix has been implemented by Engineering and is available in the following NSX releases:
NSX 4.2.3.3
NSX 9.0.2
Upgrading to one of these versions resolves the issue and allows safe use of L7 FQDN-based DFW rules.