A race condition can occur between the Gorouter and a Spring backend application when keep-alives are enabled between the two servers. This race condition results in a 502 response code for the request and there is logs associated with the failed request that read "EOF" in the Gorouter stderr and stdout logs.
When keep-alives is enabled in Gorouter, the Gorouter keeps a connection alive between itself and a backend application for 90 second after the request. If an additional request comes in to the backend application instance within 90 seconds, then the Gorouter reuses that connection. For more information, refer to Gorouter Back End Keep-Alive Connection.
Note: The default keep-alive timeout setting for Tomcat is 60 seconds.
The race condition occurs when Gorouter attempts to reuse a backend connection that the Tomcat server has already started closing.
In most cases, the connection is closed successfully before Gorouter attempts to reuse it. However, requests that take more time such as POST or PUT requests, can provide just enough additional processing requirements to allow this race condition to potentially show itself.
The window for this race condition to occur is small, but it is there. For this to occur, two requests spaced 60 seconds apart would have to ingress through the same Gorouter destined for the same backend application container for this to occur.
Note: The second request in this situation is a POST or PUT request.
public class TomcatWebServerFactoryCustomizer implements WebServerFactoryCustomizer<ConfigurableTomcatWebServerFactory> { @Override public void customize(ConfigurableTomcatWebServerFactory factory) { factory.addConnectorCustomizers((connector) -> { ProtocolHandler handler = connector.getProtocolHandler(); if (handler instanceof AbstractProtocol) { final AbstractProtocol<?> protocol = (AbstractProtocol<?>) handler; protocol.setKeepAliveTimeout(120000); } }); } }
Alternatively, the environment variable SERVER_TOMCAT_CONNECTION_TIMEOUT may be used. It is not the same as the keep alive timeout, but should remedy the situation.
For War/Tomcat apps, use the Java Buildpack's external configuration option to customize the Tomcat config. Documentation for that feature is External Tomcat Configuration.
You will need to override the `server.xml` file, copy `server.xml ` and modify this line:
<Connector port='${http.port}' bindOnInit='false' connectionTimeout='20000'/>
Change the line to look like this:
<Connector port='${http.port}' bindOnInit='false' connectionTimeout='20000' keepAliveTimeout='120000'/>
Procedure to verify if app is closing the session vefore the 90 sec set by gorouter
1. Get the app details and diego cell where the app is running
Get the deployed app:
cf apps
Getting apps in org test / space test as admin...
name requested state processes routes
drop-con started web:1/1 drop-con.lab10apps-testlab.lab1.net
Get the GUID:
cf app drop-con --guid
c105317f-1ebc-4604-9bfc-e64b625c17d5
Identify the app details:
cf curl /v2/apps/c105317f-1ebc-4604-9bfc-e64b625c17d5/stats
{"0":{"state":"RUNNING","stats":{"name":"drop-con","uris":["drop-con.lab10apps-testlab.lab1.net"],"host":"10.xxx.241.7","port":61048,"uptime":21298,"fds_quota":16384,"mem_quota":33554432,"disk_quota":1073741824,"log_rate_limit":16384,"usage":{"time":"2025-03-21T15:00:11+00:00","cpu":0.002830326780111023,"cpu_entitlement":0.1767627524842,"mem":17104896,"disk":7507968,"log_rate":0}},"routable":true}}
Corresponds to the diego IP hosting the app can be confirmed from bosh -d <CF-ID> vms
bosh -d cf-838581a30973d55efef7 vms | grep 10.159.241.7
diego_cell/23038e19-af51-4803-918b-36921c4cd1c9 running az2 10.xxx.241.7 vm-2074a645-981f-4101-86c2-946503a60bb6 xlarge.disk true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.708
2. Identify the IP and interface
ssh to the app and verify the IP used
cf ssh drop-con
vcap@785cf8a4-12c3-4610-52ce-df72:~$ ip a
...
2616: eth0@if2617: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1404 qdisc noqueue state UP group default
link/ether ee:ee:0a:ff:11:62 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.xxx.17.98 peer 169.254.0.1/32 scope link eth0
valid_lft forever preferred_lft forever
ssh to the diego cell and confirm the interface
ip a | grep -C 2 10.xxx.17.98
2617: s-010255017098@if2616: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1404 qdisc noqueue state UP group default
link/ether aa:aa:0a:ff:11:62 brd ff:ff:ff:ff:ff:ff link-netnsid 12
inet 169.254.0.1 peer 10.xxx.17.98/32 scope link s-010255017098
valid_lft forever preferred_lft forever
3. Perform Packet capture and test the app
tcpdump -i s-010255017098 -n
Alternatively you can save the capture into a file eventually to be reviewed
tcpdump -i s-010255017098 -n -o /tmp/s-010255017098.pcap
Expected is that there will be no traffic as this is a new app with no registered connections
from a jumpbox execute a curl request to the destined inteface
curl -kv https://drop-con.lab10apps-testlab.lab1.net
The command have to result in 200 ok (not error) in order to start the process of keep-alive from the gorouter
The packets should be flowing for 90 seconds and terminated by the gorouter side.