This article will discuss common causes of "Connection reset" errors seen in the API Gateway logs. While not a complete list, this should help with some of the most common areas to focus on to troubleshoot the issue.
When using an HTTP routing assertion or any assertion that uses underlying routes, such as the 'Retrieve OAuth 2.0 Token' assertion, you may notice connection failures and a similar log entry to the one below in the SSG logs:
Error occurred. com.l7tech.common.http.GenericHttpException: Unable to obtain HTTP response from https://www.example.com: Connection reset at com.l7tech.common.http.prov.apache.components.f.getResponse(Unknown Source)
java.net.SocketException: Connection reset
This article applies to all API Gateway versions logging the "Connection reset" error.
Connection reset errors are typically caused by the remote server terminating the connection prematurely while not notifying the requestor. In other words, the backend closes the connection while the API Gateway is still sending information over the wire. The API Gateway becomes aware of this termination when it tries to send the next packet of data, as the backend will reject it due to it having already terminated the connection from its side. Since the API Gateway didn't receive any FIN or RST packets before the last data packet had been sent, it assumes it then classifies it as a "Connection reset" once it finds out the backend terminated the connection prematurely.
There are a number of possible causes for the "Connection reset" error, the most common root causes are below, almost all of which are network-related:
In summary, the issue is usually configuration related (i.e. timeout settings on load balancer or backend servers are too low), or may be caused by a network issue which is causing the traffic to take too long to arrive and thus the read timeout gets hit unexpectedly.
Since the issue is most likely outside of the API Gateway, the customers networking team and backend server admin team should be engaged to review and ensure timeout values are set appropriately and are high enough and that there is no delay in network traffic.
From the API Gateway perspective, the best way to confirm that it's not an API Gateway issue is to take a tcpdump on the API Gateway, then analyzing in a tool such as Wireshark. This should show what the data is doing "on the wire" of the API Gateway. From that network trace, it can be reviewed to see if there's a pattern to the connection reset timing, if it's seen only from a certain backend or multiple backends, and used to confirm if the Gateway received a RST or FIN packet before the Gateway attempted to send the next application data packets. Effectively, it's good to have a network admin review the tcpdump output and determine if it's a healthy connection termination attempt or if it's not following the TCP/IP specifications. In the vast majority of cases, we'll find that the connection termination was abrupt and not according to TCP/IP spec, which is why the "Connection reset" error is logged on the API Gateway.
Once (or if) the API Gateway is removed from the equation via the tcpdump review, then the customer will need to investigate the issue with their other teams on the network side and likely the backend server side, if determined not to be caused by the API Gateway.