Alarm "SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit." triggered on NSX Manager UI.
search cancel

Alarm "SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit." triggered on NSX Manager UI.

book

Article ID: 375776

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Alarm similar to below is reported by the NSX Manager.



  • The SNAT rule may be defined to perform NAT to a specific IP, like - "rule xxxx at 18 out protocol any natpass from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx".

  • In the edge node log file /var/log/syslog.log, we see similar entries:

    2024-08-13T16:11:00.419Z edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="xxxxx-xx-xx-xx-xxxxx" tid="5146" level="FATAL" eventState="On" eventFeatureName="nat" eventSev="critical" eventType="snat_port_usage_on_gateway_is_high"] SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit.

  • In the same edge node syslog.log, around the same time, we may see entries about receive ring buffer exhaustion:

    2024-08-13T16:08:58.429Z edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="xxxxx-xx-xx-xx-xxxxx" tid="4962" level="WARNING" eventState="On" eventFeatureName="edge_health" eventSev="warning" eventType="edge_nic_out_of_receive_buffer"] Edge NIC fp-eth1 receive ring buffer has overflowed by 38.949272% on Edge node xxxxx-xx-xx-xx-xxxxx. The missed packet count is 77978 and processed packet count is 122226.

  • Around the same time, we may see a spike in the cpu usage of the edge node, as reported in the same edge node syslog.log:

    2024-08-13T16:21:16.308Z edge-hostname NSX 2079 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_datapath_cpu_very_high" eventSev="critical" eventState="On"] The datapath CPU usage on Edge node xxxxx-xx-xx-xx-xxxxx has reached 99.99% which is at or above the very high threshold for at least two minutes.

Environment

VMware NSX-T Data Center  

VMware NSX

Cause

The SNAT port allocation algorithm, is not optimally designed for large number of flows to a specific SNAT IP.

Resolution

Change the SNAT rule to perform NAT to multiple IP address. For example, modify the rule "....from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx" to "....from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx/24".