NSX prepared ESXi Host PSODs due to VSIP fp2_rulematch issue
search cancel

NSX prepared ESXi Host PSODs due to VSIP fp2_rulematch issue

book

Article ID: 391822

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

  • ESXi crashed with a PSOD with Exception 14, vmkernel, on fp2_rulematch_set in datapath/esx/modules/vsip/vsip_pf/pf/pf_policy_lookup.c:133
  • Host PSODs were observed after upgrading the environment to NSX version 4.20 and 4.2.1
  • Logs before crash:
2025-01-13T04:04:07.431Z In(182) vmkernel: cpu48:2285125)Admission failure in path: host/user/pool0/vm.2285117:vmmanon.2285117
2025-01-13T04:04:07.431Z In(182) vmkernel: cpu48:2285125)vmmanon.2285117 (1590056) requires 1812 KB, asked 1812 KB from vm.2285117 (1590026) which has 204040 KB occupied and 760 KB available.
2025-01-13T04:04:07.431Z In(182) vmkernel: cpu48:2285125)Admission failure in path: host/user/pool0/vm.2285117:vmmanon.2285117
2025-01-13T04:04:07.431Z In(182) vmkernel: cpu48:2285125)vmmanon.2285117 (1590056) requires 1812 KB, asked 1812 KB from vm.2285117 (1590026) which has 204040 KB occupied and 760 KB available.
2025-01-13T04:04:49.647Z In(182) vmkernel: cpu17:2100449)vdl2: VDL2BFDCheckAndUpdateSessionMac:1536: [nsx@6876 comp="nsx-esx" subcomp="vdl2-24302014"]BFD local vtep segment is same as remote vtep segment10.114.132.0
2025-01-13T04:05:04.185Z In(182) vmkernel: cpu4:2098475)lpfc: lpfc_change_queue_depth:765: vmhba4 3295 lun queue depth changed [0:0:32] old=62, new=63
2025-01-13T04:05:05.560Z In(182) vmkernel: cpu3:2098330)lpfc: lpfc_change_queue_depth:765: vmhba5 3295 lun queue depth changed [0:0:26] old=61, new=62
2025-01-13T04:05:35.968Z In(182) vmkernel: cpu3:2098678)Unmap6: 10536: DS_PS5200_23: Acquired UC with 0 OP ucOffset 94208 ucIndex: 8
2025-01-13T04:05:35.969Z In(182) vmkernel: cpu3:2098678)Unmap6: 10568: DS_PS5200_23: Unmap capability 0 OP i: 0 offset: 65536 endROffset: 589824 Acquired: TRUE
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pfioctl: ticket 64400 != [1]64399
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)VSIPConversionCreateRuleSet: Cannot insert #105 rule 4369108: 16
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pf_rollback_rules: rs_num: 1, anchor: mainrs
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pf_rollback_rules: rs_num: 2, anchor: mainrs
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pf_rollback_rules: rs_num: 4, anchor: mainrs
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pf_rollback_rules: rs_num: 5, anchor: mainrs
2025-01-13T04:05:54.128Z In(182) vmkernel: cpu11:2100435)pf_rollback_rules: rs_num: 6, anchor: mainrs
  • In log file "nsx-syslog.log" following can be observed:
2025-01-13T04:05:54.128Z Er(179) cfgAgent[2100414]: NSX 2100414 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="2AAE1700" level="error" errorCode="LCP01106"] dfw: Failed to apply rule config to filter nic-2594511-eth0-vmware-sfw.2 of vif e02435fa-1eea-4d8a-8fef-04397c5f808e: ioctl failed

 

Environment

NSX Version 4.2.0 and 4.2.1

Cause

VSIP Fix for pf_find_anchor() which is not concurrency-safe. When multiple threads were searching the anchor tree for a ruleset at the same time, an incorrect ruleset pointer can get returned.

Resolution

If you run into this PSOD, please create a support request case with Broadcom so that we can validate.

Resolution:
Upgrade to NSX 4.2.2.1

Additional Information

Please reach out to support if this issue is observed.