NSX UI Alarm "SNAT ports usage on logical router <Logical Router UUID> for SNAT IP <SNAT IP> has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit."
search cancel

NSX UI Alarm "SNAT ports usage on logical router <Logical Router UUID> for SNAT IP <SNAT IP> has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit."

book

Article ID: 375776

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Alarm similar to the below is observed in the NSX UI:



  • The SNAT rule may be defined to perform NAT to a specific IP similar to the below:
    "rule <Rule Number> at 18 out protocol any natpass from any to ip <Source Network IP>/<Subnet Prefix Length> snat ip <SNAT IP>".

  • In the edge node log file /var/log/syslog.log, we see entries simlar to the below:
    edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="<Logical Router UUID>" tid="5146" level="FATAL" eventState="On" eventFeatureName="nat" eventSev="critical" eventType="snat_port_usage_on_gateway_is_high"] SNAT ports usage on logical router <Logical Router UUID> for SNAT IP <SNAT IP> has reached the high threshold value of 80%. New flows will not be SNATed when usage reaches the maximum limit.
  • In the same log file, around the same time, we may see entries indicting edge fp-eth receive ring buffer exhaustion:
    edge-hostname NSX 4756 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="<Edge Node UUID>" tid="4962" level="WARNING" eventState="On" eventFeatureName="edge_health" eventSev="warning" eventType="edge_nic_out_of_receive_buffer"] Edge NIC fp-eth1 receive ring buffer has overflowed by 38.949272% on Edge node <Edge Node UUID>. The missed packet count is 77978 and processed packet count is 122226.
  • In the same log file, around the same time, we may see entries indicting high edge node CPU usage:
    edge-hostname NSX 2079 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_datapath_cpu_very_high" eventSev="critical" eventState="On"] The datapath CPU usage on Edge node <Edge Node UUID> has reached 99.99% which is at or above the very high threshold for at least two minutes.
  • Packet drops and BFD flaps may be observed if datapath CPU usage reaches 100%.

Environment

VMware NSX-T Data Center  

VMware NSX

Cause

High usage of an SNAT rule with a single SNAT IP and port, leads to SNAT port exhaustion for the SNAT IP and high edge node CPU usage.

 

 

Resolution

This issue is resolved in VMware NSX 4.2.2 and VCF 9.0, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

NB:  The SNAT port allocation algorithm was improved to mitigate high CPU usage when the SNAT ports are exhausted. 

 

Workaround

Recommendation for SNAT rules that are expected to have a large number of simultaneous open connections, is to configure the SNAT rule (or edit existing SNAT rule) to perform NAT to multiple IP addresses.

For example, modify existing SNAT rule as follows:   "....from any to ip xx.xx.xx.xx/16 snat ip <SNAT IP>" to "....from any to ip xx.xx.xx.xx/16 snat ip <SNAT IP>/24".