gorouter logs lines with x_cf_routererror:"endpoint_failure (context deadline exceeded)"
search cancel

gorouter logs lines with x_cf_routererror:"endpoint_failure (context deadline exceeded)"

book

Article ID: 378520

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

In a GoRouter log, the message 'x_cf_routererror:"endpoint_failure (context deadline exceeded)"' means that the router (GoRouter) was unable to successfully get a responde from the intended backend endpoint within the specified time limit.

E.g

123456-abcd-4321-cd43-ab4g43r5db21 RTR/16 2024-04-04T17:26:11.144916325Z OUT my-app.my-app-domain.com - [2024-04-04T17:11:11.144056165Z] "GET /an-end-point HTTP/1.1" 502 0 67 "-" "-" "x.x.x.x:35570" "x.x.x.x:61015" x_forwarded_for:"x.x.x.x" x_forwarded_proto:"https" vcap_request_id:"5ab96-95da-1234-453a-a2c123456cca" response_time:900.000604 gorouter_time:0.000209 app_id:"12345678-3c32-4321-ad32-1a4d77d3de12" app_index:"1" instance_id:"ab32d43e-6a3a-32de-35a1-7654" x_cf_routererror:"endpoint_failure (context deadline exceeded)" x_mc_correlation_id:"-" x_correlation_id:"-" correlation_id:"-" x_b3_traceid:"3ab84dea76fg12214554a2b121320abc" x_b3_spanid:"123456789" x_b3_parentspanid:"-" b3:"abcdefghijk"

Cause

Possible Causes:

  • Slow application response: The backend service or application is too slow in processing the request.
  • Network issues: There may be network connectivity problems between the GoRouter and the backend service.
  • Overloaded endpoint: The backend could be overloaded, leading to longer response times.
  • Misconfiguration: There might be configuration issues causing delays or the router's timeout setting is too short for the current workload.

Resolution

This usually points the application as root cause of the issue. Find below some tips to troubleshoot it.
  • When the issue happens, collect few thread dump and analyse them(we don't support analysing thread dumps). Easier way for that is explained in following doc.
  • When the issue happens, run strace on the app process.Use following KB article to find the root OS pid (this is, pid, outside the container) and then run strace on it.
  • Start a tcpdump at any time, wait until the issue happens, and check it to search any possible communication issue.
  • Ask the dev team to include as many debug lines as possible in the code handling the endpoint, to see where it get stuck.