Symptoms:
When app reuses and intends to send requests over the connection, it gets reset immediately or fails due to time out as following error message:
Post https://login.<SYSTEM_DOMAIN>/<PATH>: read tcp IP:PORT-IP:443: read: connection reset by peer I/O error on POST request for "https://api.<SYSTEM_DOMAIN>/<PATH>": api.<SYSTEM_DOMAIN>:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: api.<SYSTEM_DOMAIN>:443 failed to respond I/O error on GET request for "https://autoscale.<SYSTE_DOMAIN>/<PATH>": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
When apps running on Elastic Application Runtime (EAR) accesses other endpoint on the foundation, network routing would be like:
On the app (in container), go to Diego Cell, select Gateway (e.g. NAT gateway), go to Load Balancer (LB). From there go to Gorouter, select Destination.
In case the app established HTTP keep-alive connection to destination, LB / NAT gateway (or other network nodes like firewall) may drop the connection when it's idle for certain time, but without sending back TCP RST to notify client(the app).
Usually LB / NAT gateway drops connection for reasons:
Because the HTTP keep-alive connection gets dropped without TCP RST, client app regards it as alive, once app intends to send request over the dropped connection, it receives a TCP RST immediately or receives nothing(Azure LB).
Most LB / NAT gateway supports the feature sending TCP RST by default when drop idle connections, we recommend not disable the feature.
Azure LB did not support TCP RST when drop idle connections, and it was not configurable, please refer workaround at - Azure Networking Connection Idle for more than Four minutes. According to latest update from Azure on September 24, 2018 - Azure Load Balancer TCP resets on idle in preview, the feature to enable TCP RST is in preview.
In the case TCP RST is not supported on any network component, we recommend to configure as the following way:
From the Operations Manager screen, select Elastic Application Runtime, select Settings, go to Network, select Frontend Idle Timeout for Gorouter and HAProxy as less than the idle time before LB / NAT gateway drops it. By reducing the "Frontend Idle Timeout for Gorouter and HAProxy" from default 15 minutes, Gorouter could close idle connection in graceful way by sending back TCP RST before LB / NAT gateway does.