Users of applications running on Tanzu Application Service (TAS) for VMs may experience unexpected 502 errors. Upon looking into /var/vcap/sys/log/gorouter/gorouter.stdout.log, you will see "too many transfer encodings" in the error message. This happens after upgrading to TAS for VMs 2.7.30, 2.8.24, 2.9.18 or 2.10.10, or Isolation Segment 2.7.29, 2.8.23, 2.9.17, 2.10.9.
For example:
"error":"net/http: HTTP/1.x transport connection broken: too many transfer encodings: [\"chunked\" \"chunked\"]
This knowledge article is going to discuss how operation teams and developer teams can prepare for this change and detect applications that are incorrectly returning transfer-encoding headers so they can be prepared to upgrade to newer versions of TAS.
For background into the issue and to better understand the details of this problem, please see the knowledge article Applications on TAS for VMs get 502 chunked response error.
This particular issue is difficult to debug because, in older versions of Tanzu Application Service, the Gorouter and Go components would silently ignore multiple transfer-encoding headers. Thus there are no log messages you can search for to detect this problem.
To ensure that customers are able to detect problem applications prior to upgrading to a version of TAS that does not permit multiple transfer-encoding headers, VMware Tanzu Engineering has published specific patch versions of Tanzu Application Service and Isolation Segment Tiles called the mitigation patch
The below tables show which Tile versions introduced this breaking change for Tanzu Application Server and Tanzu Application Isolation Segment.
Tanzu Application Tile Versions
Major Version | First Version with Transfer Encoding Breaking Change | Mitigation Patch | Transfer Encoding Breaking Change Reintroduced |
---|---|---|---|
2.7 | 2.7.30 | 2.7.31 | 2.7.32+ |
2.8 | 2.8.24 | 2.8.25 | 2.8.26+ |
2.9 | 2.9.18 | 2.9.19 | 2.9.20+ |
2.10 | 2.10.10 | 2.10.11 | 2.10.12+ |
2.11 | 2.11.0 | Not Applicable | All version of 2.11 include the Transfer Encoding breaking change |
Tanzu Application Isolation Segment Tile Versions
Major Version | First Version with Transfer Encoding Breaking Change | Mitigation Patch | Transfer Encoding Breaking Change Reintroduced |
---|---|---|---|
2.7 | 2.7.29 | 2.7.30 | 2.7.31+ |
2.8 | 2.8.23 | 2.8.24 | 2.8.25+ |
2.9 | 2.9.17 | 2.9.18 | 2.9.19+ |
2.10 | 2.10.9 | 2.10.10 | 2.10.11+ |
2.11 | 2.11.0 | Not Applicable | All version of 2.11 include the Transfer Encoding breaking change |
To determine if customers will be impacted by this change in Go & Gorouter, all customers should do the following:
For example, if you are on the 2.7 LTS branch of TAS, you should first upgrade to TAS 2.7.31, then validate your applications are correct, and after doing that, you may safely upgrade to 2.7.32+.
As an operations team, you want a solution to monitor all applications running on your platform so that you can catch any application returning multiple transfer-encoding headers and then notify your application development teams. Mitigation patch versions 2.7.31, 2.8.25, 2.9.19, and 2.10.11 of TAS and versions 2.7.30, 2.8.24, 2.9.18, 2.10.10 of Isolation Segment all have the following functionality built in to accomplish this.
You may monitor the Gorouter logs. These versions of TAS will output log entries when multiple transfer-encoding headers or an invalid transfer-encoding header (like gzip or identity) are returned. The application log message includes request information as well as the org, space, and application name so that you can easily notify the correct application development team. The route service log entries include the route URL as well as the x-cf-forwarded-url
header which shows you the request that triggered the problem.
If you are using a service like Splunk or ELK, you can search for the phrases received-deprecated-response-from-app and received-deprecated-response-from-route-service. The former will indicate applications that return multiple transfer-encoding headers and the latter will return route services that return multiple transfer-encoding headers.
router/c99bebc8-2612-45cb-987d-af3b2c17f5e9: stdout | {"log_level":3,"timestamp":"2021-02-10T03:18:00.340162741Z","message":"received-deprecated-response-from-app","source":"vcap.gorouter","data":{"route-endpoint":{"ApplicationId":"ad561f0c-####-443a-9969-0cbc9afd34aa","Addr":"10.###.##.23:61046","Tags":{"app_id":"ad561f0c-####-443a-9969-0cbc9afd34aa","app_name":"route-service-example","component":"route-emitter","instance_id":"0","organization_id":"c968e6b8-d8bf-48f4-8e4f-272bf3295052","organization_name":"#####","process_id":"ad561f0c-####-443a-9969-0cbc9afd34aa","process_instance_id":"ba9f9703-####-4b73-5504-f29e","process_type":"web","source_id":"ad561f0c-####-443a-9969-0cbc9afd34aa","space_id":"98764b84-ab78-4244-a92e-eee94b392abc","space_name":"dev"},"RouteServiceUrl":""},"host":"route-service-example-#########-#########-cr.cfapps-44.slot-59.###.#######.com"}} router/c99bebc8-####-45cb-987d-af3b2c17f5e9: stdout | {"log_level":3,"timestamp":"2021-02-11T14:44:51.251480691Z","message":"received-deprecated-response-from-route-service","source":"vcap.gorouter","data":{"route-service-url":{"Scheme":"https","Opaque":"","User":null,"Host":"route-service-example-#######-########-cr.cfapps-44.slot-59.pez.vmware.com","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"x-cf-forwarded-url":"https://spring-music.cfapps-##.slot-##.###.#####.com/incorrect"}}
If as an operations team you are redirecting all application logs to an aggregation service like Splunk or ELK or you are directly sending Gorouter access logs to syslog, you can also search for x_cf_routererror where the value is too_many_transfer_encodings. Every access log entry written to the application log stream by Gorouter will have the x_cf_routererror field and any request which would have failed due to the restrictions Gorouter is imposing on transfer-encoding headers will have an error message in this field.
2021-02-11T09:40:45.53-0500 [RTR/0] OUT route-service-example-anxious-wildebeest-cr.cfapps-44.slot-59.pez.vmware.com - [2021-02-11T14:40:45.457165501Z] "GET / HTTP/1.1" 200 0 4669 "-" "curl/7.64.1" "10.213.54.5:43688" "10.213.54.23:61046" x_forwarded_for:"10.###.##.253, 10.###.##.5, 10.###.##.19, 10.###.##.5" x_forwarded_proto:"https" vcap_request_id:"0be1fee9-462b-####-5d4f-3d2f6c2d91f4" response_time:0.080853 gorouter_time:0.000789 app_id:"ad561f0c-####-443a-9969-0cbc9afd34aa" app_index:"0" x_cf_routererror:"too_many_transfer_encodings" x_b3_traceid:"a509eb58605e4327" x_b3_spanid:"a509eb58605e4327" x_b3_parentspanid:"-" b3:"a509eb58605e4327-a509eb58605e4327"
If you are not sending application logs to an aggregations service or syslog, then you can capture the Gorouter access logs directly from the VM. They reside under /var/vap/sys/log/gorouter or you can obtain them with bosh logs or through Ops Manager's log retrieval mechanism.
Once you have the logs, you can run grep 'x_cf_routererror:"too_many_transfer_encodings"' access.log. This will return the log lines for requests to impacted applications. Alternatively, you can use the top-logs tool to extract aggregate information from your access logs, including the number of requests that are flagged as impacted.
You can monitor the metric deprecated_responses. Gorouter will emit this metric which is a counter that will indicate how many instances of this transfer-encoding issue have been observed by Gorouter.
Here's an example of what the metric looks like observed with the cf nozzle cf cli plugin.
$ cf nozzle -no-filter | grep deprecated origin:"gorouter" eventType:CounterEvent timestamp:1612881548929244098 deployment:"cf" job:"router" index:"173a2629-####-####-988c-afb5b046e899" ip:"10.#.#.#" tags:<key:"source_id" value:"gorouter" > counterEvent:<name:"deprecated_responses" delta:0 total:1 >
If you are sending TAS foundation metrics to a metrics monitoring platform like Prometheus/Grafana, you may want to add a chart to visualize this metric. To ensure that your foundation is not going to be impacted when you upgrade, you will want this metric to consistently display a zero value. Zero means that there are no requests which have been impacted by the stricter requirements in Go 1.15.
As a development team, you can also watch for this problem. Once your Operations Team has upgraded your foundation to one of these TAS versions 2.7.31, 2.8.25, 2.9.19, and 2.10.11, and one of these Isolation Segment 2.7.30, 2.8.24, 2.9.18, 2.10.10 versions then you will be able to use the following methods to determine if your applications will be impacted.
Search your application logs for lines where the x_cf_routererror field has a value of too_many_transfer_encodings. Every access log entry written to the application log stream by Gorouter will have the x_cf_routererror field and any request that would have failed due to the new restrictions Go 1.15 is imposing on transfer-encoding headers will have an error message in this field.
2021-02-11T09:40:45.53-0500 [RTR/0] OUT route-service-example-anxious-wildebeest-cr.cfapps-44.slot-59.pez.vmware.com - [2021-02-11T14:40:45.457165501Z] "GET / HTTP/1.1" 200 0 4669 "-" "curl/7.64.1" "10.###.##.#:43688" "10.###.##.23:61046" x_forwarded_for:"10.166.51.253, 10.213.54.5, 10.213.54.19, 10.###.##.5" x_forwarded_proto:"https" vcap_request_id:"0be1fee9-462b-####-5d4f-3d2f6c2d91f4" response_time:0.080853 gorouter_time:0.000789 app_id:"ad561f0c-####-443a-9969-0cbc9afd34aa" app_index:"0" x_cf_routererror:"too_many_transfer_encodings" x_b3_traceid:"a509eb58605e4327" x_b3_spanid:"a509eb58605e4327" x_b3_parentspanid:"-" b3:"a509eb58605e4327-a509eb58605e4327"
When making requests to your applications, if a request is impacted Gorouter will insert the header Cf-Deprecated-Response. If you are seeing this header in response headers from requests to your application, then those requests are impacted by this change.
For example:
> GET /albums HTTP/1.1 > Host: spring-music.cfapps-44.slot-##.###.#####.com > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 200 OK < Cf-Deprecated-Response: too many transfer encodings: ["chunked" "chunked"] < Content-Type: application/json;charset=UTF-8 < Date: Wed, 10 Feb 2021 03:25:03 GMT < X-Vcap-Request-Id: 19a46950-####-####-####-9c557f995bf2 < Transfer-Encoding: chunked