NSX Gateway Firewall on T0/T1 Causing TCP Drops with Load Balancer
book
Article ID: 381564
calendar_today
Updated On:
Products
VMware NSXVMware NSX-T Data Center
Issue/Introduction
When the client (source) tries to access the application (hosted on pool server) behind the load balancer, intermittently, the requests fail.
Packet capture on the load balancer service interface shows TCP port being reused for the dropped connection:
The connection does not go through (the client (the source IP in the above screenshot) never receives a SYN-ACK from the load balancer VIP (the destination IP in the above screenshot) and therefore, client retransmits, eventually resulting in a failed connection.
Environment
VMware NSX VMware NSX-T Data Center
Cause
Gateway Firewall drops the new connection because, there is a still a half-open TCP connection with the same 5-tuple (i.e. protocol number, source address, destination address, source port, and destination port).
Below are the packets we see on the load balancer service interface for the new connection. Client sends a SYN to the LB VIP and as the VIP does not respond with a SYN-ACK, client retransmits the SYN and eventually, the connection fails to establish the TCP handshake:
Minutes before this new connection, there was another connection request with the same 5-tuple and it was successfully established. However, at the end, we only see the LB VIP sending the FIN-ACK. We don't see any FIN-ACK from the client. Therefore, the gateway firewall treats this connection as half-open for the following 15minutes. Therefore, within the following 15minutes, if there is a new connection request with the same 5-tuple, gateway firewall will drop the new connection request. Below is the capture of the previous half-open connection showing there is no FIN-ACK received from the client:
Resolution
If the client needs to aggressively re-use the TCP ports (with the same 5-tuple) and we are having a situation where the client is not cleanly closing the connection (sending a FIN-ACK to the LB VIP), we have two workarounds:
If there are no gateway firewall rules configured, disable gateway firewall so that, it does not monitor for any half-open connections.
From Create a session timer, create a new session timer profile with the TCP Closing timer adjusted to how fast the client is expected to re-use the TCP ports. For example, if this is adjusted to 2minutes (instead of the default 15minutes), firewall will purge the half-open connection after 2minutes and therefore, any new connection with the same 5-tuple will be allowed by the firewall.