Affected TAS versions are 2.11.37, 2.13.19, and 3.0.9
Affected IST versions are 2.11.31, 2.13.16, and 3.0.9
find . -name "gorouter.stdout.log*" | while read line; do grep backend-endpoint-failed $line | jq -r '. | select(.data.error | contains("context canceled")) | .data.vcap_request_id'; done
find . -name "gorouter.stdout.log*" | while read line; do grep backend-endpoint-failed $line | jq -r '. | select(.data.error | contains("context canceled")) | .data.vcap_request_id'; done | head -1 27116dd3-f047-4a35-7873-e9ef7e1d3f71then we find the log line that has the application ID
find . -name "gorouter.stdout.log*" | xargs egrep -Hn 27116dd3-f047-4a35-7873-e9ef7e1d3f71 ./router.XXXXXXXX-XXXX-XXXX-XXXX-543579d74ed0.2023-05-05-18-05-52/gorouter/gorouter.stdout.log:192:{"log_level":3,"timestamp":"2023-05-04T19:38:42.838473790Z","message":"backend-endpoint-failed","source":"vcap.gorouter","data":{"route-endpoint":{"ApplicationId":"d45e4b57-3420-40b3-b13d-9ef0562d58c5",REDACTED,"RouteServiceUrl":""},"error":"incomplete request (context canceled)","attempt":1,"vcap_request_id":"27116dd3-f047-4a35-7873-e9ef7e1d3f71","retriable":true,"num-endpoints":1,"got-connection":false,"wrote-headers":false,"conn-reused":false,"dns-lookup-time":0,"dial-time":0,"tls-handshake-time":0}}
egrep -A5 -Hn 27116dd3-f047-4a35-7873-e9ef7e1d3f71 ./router.XXXXXXXX-XXXX-XXXX-XXXX-543579d74ed0.2023-05-05-18-05-52/gorouter/gorouter.stdout.log | egrep "prune-failed-endpoint|d45e4b57-3420-40b3-b13d-9ef0562d58c5" | egrep prune-failed-endpoint ./router.XXXXXXXX-XXXX-XXXX-XXXX-543579d74ed0.2023-05-05-18-05-52/gorouter/gorouter.stdout.log-193-{"log_level":3,"timestamp":"2023-05-04T19:38:42.838565797Z","message":"prune-failed-endpoint","source":"vcap.gorouter.registry","data":{"route-endpoint":{"ApplicationId":"d45e4b57-3420-40b3-b13d-9ef0562d58c5",REDACTED,"process_instance_id":"2ea1596c-a745-4fdc-53a4-d885","process_type":"web","source_id":"d45e4b57-3420-40b3-b13d-9ef0562d58c5",REDACTED,"RouteServiceUrl":""}}}
Customers hitting this bug can adapt to the bug and minimize its impact by increasing the number of application instances. Though depending on the frequency of these errors scaling the instance count may not help.
To eliminate the bug, they must get off the routing-release 0.262.0
Option 1) Wait for new routing-release 0.266.0 to be incorporated into the latest TAS release. These new releases are expected in the mid-May 2023 time frame.
Option 2) Manually update the routing-release, either to 0.261.0 or 0.266.0, using the procedure in this article: https://knowledge.broadcom.com/external/article?articleNumber=293785