As one example, when updating a Spring Cloud Services instance (such as config server instance):
cf update-service config-server -c '{"git":{"uri": "https://xxxxx"}}'
Update fails after a few minutes, Spring Cloud Services Broker Worker reports network I/O error with /copy_bits operation against Pivotal Cloud Foundry (PCF) API endpoint (https://api.SYSTEM_DOMAIN):
2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT 2018-04-23 23:32:15.209 ERROR [spring-cloud-service-broker-worker, dede2c16d0d80aec,dede2c16d0d80aec,false] 15 --- [cTaskExecutor-2] i.p.s.s.messaging.RequestHandler : Error updating service instance: org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://api.SYSTEM_DOMAIN/v2/apps/6fbb83d8-36ad-46cb-a8e9-47d291553c9b/copy_bits": api.SYSTEM_DOMAIN:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: api.SYSTEM_DOMAIN:443 failed to respond 2018-04-24T01:32:15.21+0200 [APP/PROC/WEB/0] OUT org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://api.SYSTEM_DOMAIN/v2/apps/6fbb83d8-36ad-46cb-a8e9-47d291553c9b/copy_bits": api.SYSTEM_DOMAIN:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: api.SYSTEM_DOMAIN:443 failed to respond
This issue happens in environments where the the customer's Load Balancer that sits in front of TAS is configured with an aggressive http-keep-alive such as few seconds. In such environments, when the SCS Broker and Worker starts a request to PCF / TAS API endpoint using a connection in the connection pool, the connection could by accident be reset by the Load Balancer at a very high rate due to the aggressive timeout.
Usually, the SCS Broker Worker retries failed requests, it starts new connection or uses another connection in the pool immediately and the retry succeeds. However, in environments with an aggressive http-keep-alive, the /copy_bits API request can fail and cause the exception because there is no retrial for this API endpoint. It does not retry the request because this particular API endpoint generates heavy load on the Cloud Controller. As a result, cf update-service fails due to /copy_bits failure.