NSX-T 3.1.3.6 Edge configured with an L4 LB stops passing all traffic
search cancel

NSX-T 3.1.3.6 Edge configured with an L4 LB stops passing all traffic

book

Article ID: 318290

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Manager reports an alarm "Edge node datapath mempool is high"
  • Edge syslog (/var/log/syslog) reports hugepage exhaustion (note size values may vary depending on configuration)
2022-01-28T11:56:32.029Z edge-2 NSX 9444 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" level="ERROR"] Memory resource from hugepage exhausted in the firewall service size=1112(0M)
2022-01-31T13:22:25.486Z edge-0 NSX 10289 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" level="WARN"] vrfid 12000 id 0x1c000003d0001c92 purge: TCP 1.1.1.12:80 gwy 2.2.2.1:80 ext 10.10.254.254:48316 seq [lo=3004521833 hi=3004586969 win=237 modulator=0 wscale=7] [lo=937666734 hi=937697070 win=510 modulator=0 wscale=7] 9:9
  • On the Edge refcnt errors are large and increasing
# edge-appctl -t /var/run/vmware/edge/dpd.ctl  fw/get_debug_count  verbose | python -m "json.tool"
[
    {
        "ifuuid": "0ea201da-####-####-####-fdf4013c79a6",
        "vrf": 11,
        "Total states": 0,
        "Refcnt error": 84805,   <<<<<<<<<<
        "List insert error": 0,
        "List remove error": 0,
        "Service core mismatch": 0,
        "Purge reuse": 0,
        "Sloppy conversion": 0,
        "Valid states": 0,
        "Closing states": 0,
        "Unlink states": 0,
        "Purge states": 0,
        "Purge clean states": 0,
        "List valid states": 0,
        "List closing states": 0,
        "List unlink states": 0,
        "List purge states": 0,
        "List purge clean states": 0
    },
    {
        "ifuuid": "32a40265-####-####-####-e84e0064bfd8",
        "vrf": 308,
        "Total states": 0,
        "Refcnt error": 77072,   <<<<<<<
        "List insert error": 0,
        "List remove error": 0,
        "Service core mismatch": 0,
        "Purge reuse": 0,
        "Sloppy conversion": 0,
        "Valid states": 0,
        "Closing states": 0,
        "Unlink states": 0,
        "Purge states": 0,
        "Purge clean states": 0,
        "List valid states": 0,
        "List closing states": 0,
        "List unlink states": 0,
        "List purge states": 0,
        "List purge clean states": 0
    },



Environment

NSX-T Data Center 3.1.3.6

NSX-T Load Balancer is configured with a Layer 4 Virtual Server

Cause

This issue occurs due to a memory leak triggered by traffic to an L4 LB VIP.
Memory associated with L4 LB sessions is not automatically released when the connections close or terminate, over time datapathd memory is exhausted and the Edge can no longer process traffic.

Resolution

This issue is resolved in NSX-T Data Center 3.1.3.7

Workaround:
The Edge node experiencing the issue must be rebooted.