Intermittent 502 EOF Gorouter errors for Spring Apps
search cancel

Intermittent 502 EOF Gorouter errors for Spring Apps

book

Article ID: 298104

calendar_today

Updated On: 03-26-2025

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

A race condition can occur between the Gorouter and a Spring backend application when keep-alives are enabled between the two servers. This race condition results in a 502 response code for the request and there is logs associated with the failed request that read "EOF" in the Gorouter stderr and stdout logs.

When keep-alives is enabled in Gorouter, the Gorouter keeps a connection alive between itself and a backend application for 90 second after the request. If an additional request comes in to the backend application instance within 90 seconds, then the Gorouter reuses that connection. For more information, refer to Gorouter Back End Keep-Alive Connection.

Note: The default keep-alive timeout setting for Tomcat is 60 seconds.

The race condition occurs when Gorouter attempts to reuse a backend connection that the Tomcat server has already started closing.

In most cases, the connection is closed successfully before Gorouter attempts to reuse it. However, requests that take more time such as POST or PUT requests, can provide just enough additional processing requirements to allow this race condition to potentially show itself.

The window for this race condition to occur is small, but it is there. For this to occur, two requests spaced 60 seconds apart would have to ingress through the same Gorouter destined for the same backend application container for this to occur. 

Note: The second request in this situation is a POST or PUT request.


Environment

Product Version: 2.9

Resolution

Workaround

The current workaround is to increase the Tomcat connection idle timeout to a value greater than 90 seconds.


Spring Boot > v2.5 apps

For apps using Spring Boot greater than v2.5, the Tomcat connection idle timeout can be configured with the environment variable SERVER_TOMCAT_KEEP_ALIVE_TIMEOUT

Spring Boot < v2.5 apps

For apps Spring Boot less than version 2.5, a WebServerFactoryCustomizer can be used to configure this timeout programmatically.

Add this function: 
public class TomcatWebServerFactoryCustomizer implements WebServerFactoryCustomizer<ConfigurableTomcatWebServerFactory> {
	@Override
	public void customize(ConfigurableTomcatWebServerFactory factory) {
		factory.addConnectorCustomizers((connector) -> {
			ProtocolHandler handler = connector.getProtocolHandler();
			if (handler instanceof AbstractProtocol) {
				final AbstractProtocol<?> protocol = (AbstractProtocol<?>) handler;
				protocol.setKeepAliveTimeout(120000);
			}
		});
        }
}


Alternatively, the environment variable SERVER_TOMCAT_CONNECTION_TIMEOUT may be used. It is not the same as the keep alive timeout, but should remedy the situation.


War/Tomcat apps

For War/Tomcat apps, use the Java Buildpack's external configuration option to customize the Tomcat config. Documentation for that feature is External Tomcat Configuration.

You will need to override the `server.xml` file, copy `server.xml ` and modify this line: 

<Connector port='${http.port}' bindOnInit='false' connectionTimeout='20000'/>


Change the line to look like this:

<Connector port='${http.port}' bindOnInit='false' connectionTimeout='20000' keepAliveTimeout='120000'/> 

There may also be other application servers or buildpacks that could potentially face this issue. If you suspect that you are hitting this issue, obtain the app logs, Gorouter logs, and contact Tanzu Support.

Additional Information

Procedure to verify if app is closing the session vefore the 90 sec set by gorouter

1. Get the app details and diego cell where the app is running

 Get the deployed app:

cf apps
Getting apps in org test / space test as admin...
 
name              requested state   processes           routes
drop-con          started           web:1/1             drop-con.lab10apps-testlab.lab1.net

Get the GUID:

cf app drop-con  --guid
c105317f-1ebc-4604-9bfc-e64b625c17d5

Identify the app details:

cf curl /v2/apps/c105317f-1ebc-4604-9bfc-e64b625c17d5/stats
{"0":{"state":"RUNNING","stats":{"name":"drop-con","uris":["drop-con.lab10apps-testlab.lab1.net"],"host":"10.xxx.241.7","port":61048,"uptime":21298,"fds_quota":16384,"mem_quota":33554432,"disk_quota":1073741824,"log_rate_limit":16384,"usage":{"time":"2025-03-21T15:00:11+00:00","cpu":0.002830326780111023,"cpu_entitlement":0.1767627524842,"mem":17104896,"disk":7507968,"log_rate":0}},"routable":true}}

Corresponds to the diego IP hosting the app can be confirmed from bosh -d <CF-ID> vms

bosh -d cf-838581a30973d55efef7 vms | grep 10.159.241.7
diego_cell/23038e19-af51-4803-918b-36921c4cd1c9                       running    az2    10.xxx.241.7      vm-2074a645-981f-4101-86c2-946503a60bb6    xlarge.disk    true    bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.708

2. Identify the  IP  and interface

ssh to the app and verify the IP used

cf ssh drop-con
vcap@785cf8a4-12c3-4610-52ce-df72:~$ ip a
...
2616: eth0@if2617: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1404 qdisc noqueue state UP group default
    link/ether ee:ee:0a:ff:11:62 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.xxx.17.98 peer 169.254.0.1/32 scope link eth0
       valid_lft forever preferred_lft forever

ssh to the diego cell and confirm the interface

ip a | grep -C 2 10.xxx.17.98
2617: s-010255017098@if2616: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1404 qdisc noqueue state UP group default
    link/ether aa:aa:0a:ff:11:62 brd ff:ff:ff:ff:ff:ff link-netnsid 12
    inet 169.254.0.1 peer 10.xxx.17.98/32 scope link s-010255017098
       valid_lft forever preferred_lft forever

3. Perform Packet capture and test the app

tcpdump -i s-010255017098 -n

Alternatively you can save the capture into a file eventually to be reviewed

tcpdump -i s-010255017098 -n -o /tmp/s-010255017098.pcap

Expected is that there will be no traffic as this is a new app with no registered connections

from a jumpbox execute a curl request to the destined inteface

curl -kv https://drop-con.lab10apps-testlab.lab1.net

The command have to result in 200 ok (not error) in order to start the process of keep-alive from the gorouter

The packets should be flowing for 90 seconds and terminated by the gorouter side.