Spring Cloud Services "cf update-service" fails due to /copy

Products

Support Only for Spring

Issue/Introduction

Symptoms:
When updating Spring Cloud Service instance, for example, cf update-service config-server -c ' { "git": { "uri": "https://github.com/xxx/config-server-configurations.git" } }' ; , it fails on spring-cloud-service-broker-worker with error - "I/O error on POST request for "https://api.PCF_SYSTEM_DOMAIN/v2/apps/6fbb83d8-36ad-46cb-a8e9-47d291553c9b/copy_bits"

2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT 2018-04-23 23:32:15.209 ERROR [spring-cloud-service-broker-worker,dede2c16d0d80aec,dede2c16d0d80aec,false] 15 --- [cTaskExecutor-2] i.p.s.s.messaging.RequestHandler         : Error updating service instance: org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://api.PCF_SYSTEM_DOMAIN/v2/apps/6fbb83d8-###-###-a8e9-47d291553c9b/copy_bits": api.PCF_SYSTEM_DOMAIN:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: api.PCF_SYSTEM_DOMAIN:443 failed to respond
 2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://api.PCF_SYSTEM_DOMAIN/v2/apps/6fbb83d8-###-###-a8e9-47d291553c9b/copy_bits": api.PCF_SYSTEM_DOMAIN:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: api.PCF_SYSTEM_DOMAIN:443 failed to respond
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:666) ~[spring-web-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at org.springframework.security.oauth2.client.OAuth2RestTemplate.doExecute(OAuth2RestTemplate.java:128) ~[spring-security-oauth2-2.2.0.RELEASE.jar!/:na]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613) ~[spring-web-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:380) ~[spring-web-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at io.pivotal.springcloud.servicebroker.cf.legacy.CloudFoundryClientExtensions.copyAppBits(CloudFoundryClientExtensions.java:199) ~[classes/:na]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at io.pivotal.springcloud.servicebroker.cf.legacy.CloudFoundryLegacyService.copyBits(CloudFoundryLegacyService.java:65) ~[classes/:na]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at io.pivotal.springcloud.servicebroker.service.impl.BackingApplicationService.pushApp(BackingApplicationService.java:233) ~[classes/:na]
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT     at io.pivotal.springcloud.servicebroker.service.impl.BackingApplicationService.lambda$deployApps$3(BackingApplicationService.java:170) ~[classes/:na]

Environment

Cause

This problem can occur when http-keep-alive on an external Pivotal Cloud Foundry Load Balancer, Gorouter or HAProxy (if it's being used) is configured with a very aggressive value, such as few seconds. The Spring Cloud Services broker/worker access the PCF API endpoint by reusing an existing network connection in the connection pool, the connection could by coincidence be dropped by above load balancer, Gorouter or HAProxy at very high rate. Spring Cloud Services broker/worker usually resends failed request (it starts a new connection or uses another connection from pool), however /copy_bits is being handled by different approach without retrying enabled. As result, cf update-service fails due to /copy_bits failure.

Resolution

Configuring http-keep-alive to a value less than 5 seconds causes problems not only with Spring Cloud Services but also with any other applications using connection pooling, due to connections are being reset too frequently. It's highly recommended to configure http-keep-alive on load balancer, gorouter (and any other intermediate network router/proxy) at least 10 seconds to provide high stability for connection pools.