Title: Alarm for snat_port_usage_on_gateway_is_high
Event ID: nat.snat_port_usage_on_gateway_is_high
Alarm Description: SNAT port usage on the Gateway is high.
Purpose: Alarm is raised to indicate high usage of an SNAT IP which can potentially lead to new flows getting dropped. The alarm is raised when port usage crosses 80% of the total range.
Reason: Any TCP/UDP flow that matches NAT rule with action as SNAT, undergoes source port translation as well (typically referred to as PAT). The range of ports available for source port translation is limited per SNAT translation IP address. This is because for UDP/TCP protocol, port is defined as 16 bits in length. Of these 16 bits a pool is reserved as well known ports. This leaves only a subset of ports available for PAT. Therefore, at any time only a fixed number of simultaneous flows can undergo SNAT translation for an IP address used as translation IP across SNAT Rules. When the simultaneous number of flows exceed a system defined threshold of the overall available range of ports, this alarm is generated.
Impact: New TCP/UDP flows will not be able to allocate port for translation and shall be dropped after the range of ports is 100% utilized. Under such a condition, sometimes high datapath CPU utilization may also be observed.
Environment
VMware NSX-T Data Center
VMware NSX
Resolution
Steps to Resolve For 3.2.0 and above
Steps to Resolve
Check the usage of SNAT IP by checking the UDP/TCP flows in NSX Edge node where the SNAT IP is used, as follows
Log in as the admin user on Edge node and invoke the NSX CLI command `get firewall <LR_INT_UUID> connection state`. LR_INT_UUID is the interface to which the SNAT rule is applied. If the SNAT rule is not applied to any specific interface, use any Uplink interface UUID for the logical router.
Check the UDP/TCP flows listed out
Check the flows for any denial-of-service attack or anomalous burst.
For any denial-of-service attack, consider limiting usage of the NAT rule for the source of attack (e.g. apply appropriate firewall rules)
If the traffic appears to be within the normal load but the alarm threshold is hit, consider adding more SNAT IP addresses to distribute the load or route new traffic to another Edge node.