PCPU locks on ESXi host causing BGP flaps on hosted NSX Edge nodes
search cancel

PCPU locks on ESXi host causing BGP flaps on hosted NSX Edge nodes

book

Article ID: 439825

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

  • BFD/BGP flaps are observed on Edge nodes:

    2026-02-10T12:49:41.075Z nsxedge NSX 1 ROUTING [nsx@4413 comp="nsx-edge" subcomp="nsxa" s2comp="routing" level="ERROR" eventId="vmwNSXRoutingStatus"] {"event_state":0,"event_external_reason":"All BGP/BFD sessions DOWN","event_src_comp_id":"xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx"}

  • During the same time, NMI Interrupts are observed on the ESXi host hosting the concerned Edge node, with several vsip module related backtrace in vmkernel.log:


    2026-02-10T12:49:37.888Z In(182) vmkernel: cpu3:2102043)pfa_ring_buffer_init: allocation failed
    2026-02-10T12:49:37.888Z In(182) vmkernel: cpu3:2102043)VSIPConversionSetConnAttr: failed to set connection attributes

    2026-02-10T12:49:37.895Z In(182) vmkernel: cpu48:2102043)VSIPConversionSetConnAttr: failed to set connection attributes
    2026-02-10T12:49:38.230Z Wa(180) vmkwarning: cpu40:2102043)WARNING: Heap: 3670: Heap vsip-attr already at its maximum size. Cannot expand.
    2026-02-10T12:49:38.309Z In(182) vmkernel: cpu34:2100473)pfa_attrconn_lookup: failed to allocate fqdn attribute connection
    2026-02-10T12:49:39.071Z In(182) vmkernel: cpu48:2100464)VSIPConvertPfPDescToFlowRecord: flow attributes truncated (42/2030/2066)

    2026-02-10T12:49:39.071Z In(182) vmkernel: cpu48:2100464)VSIPConvertPfPDescToFlowRecord: flow attributes truncated (42/2030/2066)
    2026-02-10T12:49:39.437Z In(182) vmkernel: cpu117:2099126)VSIPConvertPfToFlowRecord: flow attributes truncated (42/2037/2066)
    2026-02-10T12:49:43.594Z Wa(180) vmkwarning: cpu81:20655819)WARNING: Heartbeat: 961: PCPU 41 didn't have a heartbeat for 5 seconds, timeout is 10, 1 IPIs sent; *may* be locked up.
    2026-02-10T12:49:43.594Z In(182) vmkernel: cpu81:20655819)Heartbeat: 1014: Sending timer IPI to PCPU 41

    2026-02-10T12:49:43.595Z Al(177) vmkalert: cpu41:2102043)ALERT: NMI: 738: NMI IPI: PC 0x420011f84637, SP 0x453a5951b440 (Src 0x1, CPU41)

 

Environment

VMware NSX

VMware vSphere ESXi

Cause

The PCPU locks on the ESXi hosts caused the Edge node to lose its network connectivity.

Resolution

The VSIP module is related to DFW, and this would thus need further investigation from NSX Security team.

If you suspect this issue has been experienced in the environment, open a case with Broadcom Support team under component VMware NSX - Firewall (DFW, GFW, IDS/IPS)