- Intermittently, access to load balanced pool members is unavailable.
- The load balancer generally services a high rate of traffic.
- Within the /var/log/lb/<Loadbalancer-UUID>/logs/error.log
of the Edge Node hosting the active instance of the T1 Service Router that the effected load balancer is attached to, you observe the following error:
2020/12/16 20:15:04 [debug] 9744#0: SNAT: alloc SNAT from Pool: nat_1681932289_1 2020/12/16 20:15:04 [debug] 9744#0: SNAT: alloc poot from IP: nat4_XXX.XXX.XXX.XXX, next port:65535. 2020/12/16 20:15:04 [error] 9744#0: SNAT: alloc SNAT from Pool: nat_1681932289_1 failed 2020/12/16 20:15:04 [error] 9744#0: l4lb failed to get snat resource
*** NOTE: exact date, time, IP's, and NAT pool values are examples.
- The next port to be allocated is 65535, which is the highest TCP port number.
The behavior above is a result of the amount of traffic traversing a load balancer related to the amount of TCP source ports available for use. By default, NSX T Server Pools use "SNAT Automap Mode", which provides one IP address to facilitate all SNAT connections to backend LB pool members. If this issue is encountered, the recommended resolution is to increase the amount of IP addresses used for those SNAT connections, which in turn will increase the amount of available TCP source ports for SNAT. This can be done by changing the Server Pool SNAT method to "SNAT IP Pool" and providing a range of multiple IP addresses to SNAT from. Any IP addresses may be used in the SNAT IP Pool as long as they are not duplicates and there is no firewall preventing that IP address from reaching the backend Server's within the pool over the specified port.
Additional information on SNAT Allocation modes can be found in the NSX T Admin Guide, found here: https://techdocs.broadcom.com/us/en/vmware-cis/nsx/nsxt-dc/3-1/administration-guide/load-balancer/setting-up-load-balancer-components/add-a-server-pool.html