Edge Datapath mempool usage high alarm
search cancel

Edge Datapath mempool usage high alarm

book

Article ID: 330516

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Edge Datapath mempool usage high
Event ID: edge_health.edge_datapath_mempool_high
Alarm Description

  • Purpose: Tracks usage percentage of various mempools.
  • Impact: Functionality corresponding to the specific mempool will be impacted.

Environment

VMware NSX-T Data Center
VMware NSX

Resolution

Steps to Resolve
For 3.0.0 and higher

Recommended Action:

  • Get the mempool usage using 'get dataplane memory stats' Edge CLI.

    Sample output for get dataplane memory stats:

    Tue Dec 10 2024 UTC 14:38:02.061
    Memory Usage
    
    Available_entries             : 1024
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 128
    Name                          : jumbo_mbuf_pool
    Size                          : 1024
    
    Available_entries             : 133924
    Available_entries_in_cache    : 570
    Cache_size_per_core           : 128
    Name                          : mbuf_pool_socket_0
    Per_core_cache
        Available_entries         : 58
        Core_id                   : 0
        Available_entries         : 183
        Core_id                   : 1
        Available_entries         : 171
        Core_id                   : 2
        Available_entries         : 158
        Core_id                   : 3
    Size                          : 217850
    
    Available_entries             : 12000
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 64
    Name                          : sess_mp_0
    Size                          : 12000
    
    Available_entries             : 12000
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 64
    Name                          : sess_priv_mp_0
    Size                          : 12000
    
    Available_entries             : 114686
    Available_entries_in_cache    : 285
    Cache_size_per_core           : 128
    Name                          : sp_pktmbuf_pool
    Per_core_cache
        Available_entries         : 163
        Core_id                   : 0
        Available_entries         : 69
        Core_id                   : 1
        Available_entries         : 18
        Core_id                   : 2
        Available_entries         : 35
        Core_id                   : 3
    Size                          : 114688
    
    Available_entries             : 16383
    Cache_size_per_core           : 0
    Name                          : fw_mon_msg
    Size                          : 16383
    
    Available_entries             : 2097146
    Available_entries_in_cache    : 2045
    Cache_size_per_core           : 512
    Name                          : pfstatepl3
    Per_core_cache
        Available_entries         : 511
        Core_id                   : 0
        Available_entries         : 513
        Core_id                   : 1
        Available_entries         : 508
        Core_id                   : 2
        Available_entries         : 513
        Core_id                   : 3
    Size                          : 2097152
    
    Available_entries             : 524288
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pffqdnippl
    Size                          : 524288
    
    Available_entries             : 524288
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pffqdnsyncpl
    Size                          : 524288
    
    Available_entries             : 524288
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pffqdndnpl
    Size                          : 524288
    
    Available_entries             : 524288
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfdnsdnpl
    Size                          : 524288
    
    Available_entries             : 262144
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pffrentpl3
    Size                          : 262144
    
    Available_entries             : 17331
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfpktpl3
    Size                          : 17331
    
    Available_entries             : 17282
    Available_entries_in_cache    : 2052
    Cache_size_per_core           : 512
    Name                          : pfsyncmbufpl3
    Per_core_cache
        Available_entries         : 513
        Core_id                   : 0
        Available_entries         : 513
        Core_id                   : 1
        Available_entries         : 513
        Core_id                   : 2
        Available_entries         : 513
        Core_id                   : 3
    Size                          : 17282
    
    Available_entries             : 49152
    Available_entries_in_cache    : 2052
    Cache_size_per_core           : 512
    Name                          : pf_fp_rule_node
    Per_core_cache
        Available_entries         : 513
        Core_id                   : 0
        Available_entries         : 513
        Core_id                   : 1
        Available_entries         : 513
        Core_id                   : 2
        Available_entries         : 513
        Core_id                   : 3
    Size                          : 49152
    
    Available_entries             : 8096
    Available_entries_in_cache    : 2052
    Cache_size_per_core           : 512
    Name                          : pf_fp_root_rule_node
    Per_core_cache
        Available_entries         : 513
        Core_id                   : 0
        Available_entries         : 513
        Core_id                   : 1
        Available_entries         : 513
        Core_id                   : 2
        Available_entries         : 513
        Core_id                   : 3
    Size                          : 8096
    
    Available_entries             : 8096
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pf_tb_root_rule_node
    Size                          : 8096
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pf_url_node
    Size                          : 1048576
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pf_dpi_conn_node
    Size                          : 1048576
    
    Available_entries             : 4194304
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfa_intattr_pl3
    Size                          : 4194304
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfa_attrconn_pl3
    Size                          : 1048576
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 220
    Cache_size_per_core           : 512
    Name                          : pf_snat_pl3
    Per_core_cache
        Available_entries         : 31
        Core_id                   : 0
        Available_entries         : 45
        Core_id                   : 1
        Available_entries         : 70
        Core_id                   : 2
        Available_entries         : 74
        Core_id                   : 3
    Size                          : 1048576
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfa_ctx_pl3
    Size                          : 1048576
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfa_key_ace_pl3
    Size                          : 1048576
    
    Available_entries             : 4194304
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pfa_value_ace_pl3
    Size                          : 4194304
    
    Available_entries             : 1048576
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : pf_hsid_pl3
    Size                          : 1048576
    
    Available_entries             : 25033
    Available_entries_in_cache    : 0
    Cache_size_per_core           : 512
    Name                          : lb_pkt_pl3
    Size                          : 25033
    


  • In case of high usage in malloc heap mempool, 'grep 'malloc_heap' in the syslog file to get the pattern for increase in usage. If the free space is fairly consistent across all the logs, it just means high usage of malloc_heap and there wont be any functional or traffic disruption.
  • Firewall related mempools, identified with prefix 'pf', high usage means capacity is low. Upgrade the Edge node to a larger form factor or increase the number of Edge nodes in the Edge cluster.
  • If the alarm was raised after a version upgrade of the Edge node or after a new configuration is pushed, increase the Edge node's form factor.

Maintenance window required for remediation? Yes

To determine the root cause for the mempool exhaustion and the dataplane impact, please gather the below details during the time of the issue and raise a case with Broadcom Support:

  • SSH root login to the active edge for the affected T-0 or T-1 gateway and run the below commands. These will help gather different stats once every 60seconds. Keep them running for 5-10mins. The relevant output files will be gathered under the /image directory:

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/getmaxstates | json_pp > /image/edge-appctl-fw-getmaxstates-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl mempool/show pfstatepl3 | json_pp > /image/edge-appctl-fw-show-pfstatepl3-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/lr/show total-stats | json_pp > /image/edge-appctl-fw-lr-total-stats-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/get_debug_count | json_pp > /image/edge-appctl-fw-get_debug_count-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl mempool/show pf_snat_pl3 | json_pp > /image/edge-appctl-fw-show-pf_snat_pl3-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do edge-appctl -t /var/run/vmware/edge/dpd.ctl mempool/show | json_pp > /image/edge-appctl-fw-show-mempool-$(date -u +"%FT%H%M%SZ").json; sleep 60; done

    while true; do su admin -c get firewall fw_int_UUID connection count > /image/get-fw-UUID-conn-count-$(date -u +"%FT%H%M%SZ").txt; sleep 60; done

    In the above last command, the <fw_int_UUID> can be obtained by running the below command and taking the ifuuid of the firewall interfaces:

    edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/lr/show | json_pp | egrep -i 'iftype|ifuuid|name'

  • While the issue is still ongoing, gather the log bundles from the edge nodes (catering to the affected T-0 or the T-1 gateways).