'kubectl logs' results in "Bad Gateway" response in Tanzu Kubernetes Grid Integrated Edition (TKGI)
search cancel

'kubectl logs' results in "Bad Gateway" response in Tanzu Kubernetes Grid Integrated Edition (TKGI)

book

Article ID: 298603

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Running the command, kubectl logs -v=9 pod-xxx, results in an http 502 "Bad Gateway" response.

Standard execution (without log verbosity) of kubectl shows "InternalError" after timeout:

Example
$ kubectl -n my-namespace logs my-pod
Error from server (InternalError): an error on the server ("<HTML>\n<HEAD>\n<TITLE>Could Not Connect</TITLE>\n</HEAD>\n\n<BODY BGCOLOR=\"white\" FGCOLOR=\"black\">\n<H1>Could Not Connect</H1>\n<HR>\n\n<FONT FACE=\"Helvetica,Arial\"><B>\nDescription: Could not connect to the requested server host.\n</B></FONT>\n<HR>\n</BODY>\n") has prevented the request from succeeding ( pods/log my-pod)

$

Turning up log level verbosity on kubectl shows "502 Bad Gateway":

Example:
$ kubectl -n my-namespace -v=9 logs my-pod
...
> GET /api/v1/namespaces/my-namespace/pods/my-pod/log HTTP/1.1

> Host: <<kube-apiserver address>>:8443

> Accept: application/json, */*

> User-Agent: kubectl/v1.14.10 (linux/amd64) kubernetes/575467a

> Authorization: Bearer [redacted]

> 

< HTTP/1.1 502 Bad Gateway

< Audit-Id: 4be812ad-754d-409f-a431-436476468eb4

< Content-Type: application/json

< Date: Tue, 30 Jun 2020 20:09:41 GMT

< Content-Length: 769


Environment

Product Version: 1.6

Resolution

In summary, the issue us caused by an unexpected proxy within the architecture. The HTTP/HTTPS Proxy had already been configured within Ops Manager.

The solution here was to add the cluster service network CIDR(s) to the no_proxy field in the Networking tab of TKGI tile, and Apply Changes.

It is possible that other issues could show up, so here is how to troubleshoot this scenario:

1. Use "bosh ssh" to get onto the cluster master, and turn up the debug level on kube-apiserver:
cd /var/vcap/jobs/kube-apiserver/config

cp -p bpm.yml bpm.yml.orig

vim bpm.yml

2. Search for the line with "--v". Change from this from --v=2 to --v=9 and save the file.

3. Restart kube-apiserver with monit restart kube-apiserver. Wait for it to restart with: watch monit summary. Then run the kubectl logs command again.

4. Set bpm.yml back to the original and monit restart kube-apiserver again.

Looking at kube-apiserver.stderr.log now, it shows the related headers and responses. 

Example

We see that it attempted to reach out to the "my-container" container which is expected.  And it reaches out to host worker node on port 10250 which is default kubelet port.  
I0706 19:30:28.396349       7 handler.go:143] kube-apiserver: GET "/api/v1/namespaces/my-namespace/pods/my-pod/log" satisfied by gorestful with webservice /api/v1
...

This is where the infrastructure showed to have a FireProxy server.  We can see that from the Response Headers: 
I0706 19:30:28.398247       7 round_trippers.go:419] curl -k -v -XGET  'https://<<worker-node-address>>:10250/containerLogs/my-namespace/my-pod/my-container'

...

We can also see the 502 response:
I0706 19:31:05.750307       7 round_trippers.go:438] GET https://<<worker-node-address>>:10250/containerLogs/my-namespace/my-pod/my-container 502 internal error - server connection terminated in 37352 milliseconds
I0706 19:31:05.750340       7 round_trippers.go:444] Response Headers:
I0706 19:31:05.750346       7 round_trippers.go:447]     Cache-Control: no-store
I0706 19:31:05.750350       7 round_trippers.go:447]     Content-Length: 247
I0706 19:31:05.750354       7 round_trippers.go:447]     Connection: Keep-Alive
I0706 19:31:05.750358       7 round_trippers.go:447]     Date: Mon, 06 Jul 2020 19:31:05 GMT
I0706 19:31:05.750362       7 round_trippers.go:447]     Server: FireProxy
I0706 19:31:05.750366       7 round_trippers.go:447]     Content-Type: text/html
I0706 19:31:05.750370       7 round_trippers.go:447]     Content-Language: en
I0706 19:31:05.750373       7 round_trippers.go:447]     Proxy-Connection: Keep-Alive
I0706 19:31:05.750377       7 round_trippers.go:447]     Via: http/1.1 Fireproxy (Fireproxy)
I0706 19:31:05.750800       7 trace.go:81] Trace[639109317]: "Get /api/v1/namespaces/my-namespace/pods/my-pod/log" (started: 2020-07-06 19:30:28.396451222 +0000 UTC m=+112.936169849) (total time: 37.35431412s):
Trace[639109317]: [13.117µs] [13.117µs] About to process Get options
Trace[639109317]: [14.178µs] [1.061µs] About to Get from storage
Trace[639109317]: [1.75256ms] [1.738382ms] About to write a response
Trace[639109317]: [37.354312001s] [37.352559441s] Transformed response object
Trace[639109317]: [37.35431412s] [2.119µs] END
I0706 19:31:05.750976       7 wrap.go:47] GET /api/v1/namespaces/my-namespace/pods/my-pod/log: (37.354726124s) 502

We can observe that kube-apiserver had an internal server error:
goroutine 15386 [running]:

...
created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
        /workspace/anago-v1.15.5-beta.0.35+20c265fef0741d/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1859 +0x4f4

logging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"an error on the server (\\\"\\u003cHTML\\u003e\\\\n\\u003cHEAD\\u003e\\\\n\\u003cTITLE\\u003eCould Not Connect\\u003c/TITLE\\u003e\\\\n\\u003c/HEAD\\u003e\\\\n\\\\n\\u003cBODY BGCOLOR=\\\\\\\"white\\\\\\\" FGCOLOR=\\\\\\\"black\\\\\\\"\\u003e\\\\n\\u003cH1\\u003eCould Not Connect\\u003c/H1\\u003e\\\\n\\u003cHR\\u003e\\\\n\\\\n\\u003cFONT FACE=\\\\\\\"Helvetica,Arial\\\\\\\"\\u003e\\u003cB\\u003e\\\\nDescription: Could not connect to the requested server host.\\\\n\\u003c/B\\u003e\\u003c/FONT\\u003e\\\\n\\u003cHR\\u003e\\\\n\\u003c/BODY\\u003e\\\\n\\\") has prevented the request from succeeding ( pods/my-pod)\",\"reason\":\"InternalError\",\"details\":{\"name\":\"my-pod\",\"kind\":\"pods/log\"},\"code\":502}\n"
 [kubectl/v1.15.5 (linux/amd64) kubernetes/20c265f <<worker-node-address>>:58498]