Edge node experiencing high Datapath Mempool alarm for pfstatepl3 and pf_snat_pl3 leading to packet loss
search cancel

Edge node experiencing high Datapath Mempool alarm for pfstatepl3 and pf_snat_pl3 leading to packet loss

book

Article ID: 401369

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall

Issue/Introduction

  • In NSX, an active Edge node shows alarm for "Edge Datapath Mempool High".
  • Upon checking the Edge details, System > Fabric > Nodes > Monitor, a few memory pools show high utilization, specifically pfstatepl3 and pf_snat_pl3.

  • An SNAT rule is configured on the Tier-1 or Tier-0 gateway.
  • The Edge datapath memory usage is under 70% and CPU usage is also low.
  • You're experiencing packet loss and slow traffic for Tier-1/Tier-0 gateways associated with the affected Edge node.
  • This issue may be intermittent and traffic may flow properly for a short period of time throughout the day. 
  • One or more Tier-1 or Tier-0 gateways may show high connection counts with the following command from root shell of affected NSX Edge node:
    root#: edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/lr/show total-stats | json_pp
    [
      {
          "uuid": "<UUID>",
          "vrf": 1,
          "pvi": 3,
          "config-loaded": true,
          "active": true,
          "name": "SR-<Gateway-Name>",
          "type": "SERVICE_ROUTER_TIER0",
          "mp-router-id": "<UUID>",
          "sync-enabled": true,
          "connection-count": 4194124,                <=========== High number of connections
  • To determine the origin of the high number of connections, examine the Edge's syslog. This log will provide the NAT translated IP address and the VRF of the associated gateway from which the connection is initiating:
    NSX 1894731 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" tname="dp-fw-purge11" level="ERROR"] pf_snat_port_delete_alarm_processing: failed to find snat hash entry. NAT addr: #######, daddr: #######, dport:#####, vrf: ###, num of snat ips crossing threshold: 0
    • The NAT addr will be a hex string. To convert it into an IP address, split it into 4 octets and convert each to decimal.
      • An example: 7F000001
        • Octet 1: 7F → 127
        • Octet 2: 00 → 0
        • Octet 3: 00 → 0
        • Octet 4: 01 → 1
        • Result: 127.0.0.1
    • Now that you know the Tier-1/Tier-0 gateway and the IP used, you can identify which NAT rule(s) the connections are coming from and what the Source IPs are.
    • You can view the statistics for the NAT rule from the UI:
      Networking > NAT > Gateway

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

This is caused by VMs on a segment that are establishing large amount of connections which caused the datapath services to run out of memory to handle these connections. 

A common scenario is when a Virtual Machine performing excessive network scanning are exhausting connection limits. 

 

Resolution

 

This is not a NSX issue. However, a workaround can be implemented to prevent a certain gateway from processing taking over too many connections and exhaust the mempool. 

Workaround:

Additional Information

If this article did not help resolve your issue, you can review the following article for further information about Edge Datapath mempool usage high alarm