Alarm "SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit." triggered on NSX Manager UI.

search cancel

Alarm "SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit." triggered on NSX Manager UI.

book

Article ID: 375776

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Alarm similar to below is reported by the NSX Manager.
The SNAT rule may be defined to perform NAT to a specific IP, like - "rule xxxx at 18 out protocol any natpass from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx".
In the edge node log file /var/log/syslog.log, we see similar entries:

2024-08-13T16:11:00.419Z edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="xxxxx-xx-xx-xx-xxxxx" tid="5146" level="FATAL" eventState="On" eventFeatureName="nat" eventSev="critical" eventType="snat_port_usage_on_gateway_is_high"] SNAT ports usage on logical router xxxxx-xx-xx-xx-xxxxx for SNAT IP xx.xx.xx.xx has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit.
In the same edge node syslog.log, around the same time, we may see entries about receive ring buffer exhaustion:

2024-08-13T16:08:58.429Z edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="00000000-0000-0000-0000-000000000001" tid="4962" level="WARNING" eventState="On" eventFeatureName="edge_health" eventSev="warning" eventType="edge_nic_out_of_receive_buffer"] Edge NIC fp-eth1 receive ring buffer has overflowed by 38.949272% on Edge node 00000000-0000-0000-0000-000000000001. The missed packet count is 77978 and processed packet count is 122226.
Around the same time, we may see a spike in the cpu usage of the edge node, as reported in the same edge node syslog.log:

2024-08-13T16:21:16.308Z edge-hostname NSX 2079 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_datapath_cpu_very_high" eventSev="critical" eventState="On"] The datapath CPU usage on Edge node xxxxx-xx-xx-xx-xxxxx has reached 99.99% which is at or above the very high threshold for at least two minutes.

Environment

VMware NSX 3.x
VMware NSX 4.x

Cause

The SNAT port allocation algorithm, is not optimally designed for large number of flows to a specific SNAT IP.

Resolution

Change the SNAT rule to perform NAT to multiple IP address. For example, modify the rule "....from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx" to "....from any to ip xx.xx.xx.xx/16 snat ip xx.xx.xx.xx/24".

Feedback

thumb_up Yes

thumb_down No