NSX Edge Nodes management CPU spikes above 90% and state become "unknown"
search cancel

NSX Edge Nodes management CPU spikes above 90% and state become "unknown"

book

Article ID: 367385

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Edge Nodes enter an unknown state, and the BGP, BFD sessions go down.
  • Edge Management CPU spikes above 90%
  • The entries below will be visible  var/log/syslog on the Edge Node, showing extremely long blocked times.
    NSX 1180088 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu2" level="WARN"] blocked 4096000 ms waiting for dp-ipc31 to quiesce

    NSX 1180088 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="dp-si-purge5" level="WARN"] blocked 4096000 ms waiting for dp-ipc31 to quiesce
    .
    .
    .

    NSX 1894731 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" level="ERROR"] pf_snat_port_delete_alarm_processing: failed to find snat hash entry. NAT addr: #######, daddr: #######, dport:#####, vrf: 202, num of snat ips crossing threshold: 0

    datapathd 1894731 firewalldp [ERROR] pf_snat_port_delete_alarm_processing: failed to find snat hash entry. NAT addr: #######, daddr: #######, dport:#####, vrf: 202, num of snat ips crossing threshold: 0

    datapath-systemd-helper 1894616 - -  2024-04-22T12:01:12Z datapathd 1894731 firewalldp [ERROR] pf_snat_port_delete_alarm_processing: failed to find snat hash entry. NAT addr: #######, daddr: #######, dport:#####, vrf: 202, num of snat ips crossing threshold: 0

    NSX 1894731 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" tname="dp-fw-purge11" level="ERROR"] pf_snat_port_delete_alarm_processing: failed to find snat hash entry. NAT addr: #######, daddr: #######, dport:#####, vrf: 174, num of snat ips crossing threshold: 0
    .
    .
    .
    NSX 1492808 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" level="ERROR"] pf_snat_port_add_alarm_processing: failed to allocate snat hash entry. NAT addr: c#######, daddr: #######, dport:#####, vrf: 101, num of snat ips crossing threshold: 0

Environment

VMware NSX-T Data center
VMware NSX

Cause

This happens when there are a lot of NAT connections and the system is unable to allocate space for a hash entry needed for the SNAT alarm feature. During that process (trying to allocate memory for hash entry and failing to do that), it locks the hash entry,  the subsequent function which tries to clear the entry tries to lock again and both threads are deadlocked as a result. 

Resolution

This issue is resolved in VMware NSX 4.1.2.2, VMware NSX-T Data Center 3.2.4 and later, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

If you believe you have encountered this issue and are unable to upgrade, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.