vCenter task "Download plug-in" keeps failing for 'VMware TKG plugin' with status code 502
search cancel

vCenter task "Download plug-in" keeps failing for 'VMware TKG plugin' with status code 502

book

Article ID: 323416

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

The KB should shed some light on the TKG plugin functionality, and help renew its TLS certificate.

Symptoms:
- Download Plug-in tasks keep failing in the vCenter with this error:
Error downloading plug-in. URL is unreachable. org.apache.http.client.HttpResponseException: status code: 502, reason phrase: Bad Gateway org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70)

- Going to any of the options in vCenter UI > Cluster  > Configure  > TKG Service fails with error "Bad Gateway 502".

Environment

VMware vSphere 7.0 with Tanzu

Cause

The TKG plugin should help configuring some options from the vCenter UI under Hosts and Clusters > Cluster > Configure > TKG Service, such as setting the Default Tanzu Kubernetes cluster CNI plugin and register clusters in Tanzu Mission Control. It communicates with tkgs-plugin-server pod in the backend through a masterproxy-tkgs-plugin pod, which acts as a reverse proxy to ensure that calls from that TKG interface in the vSphere client are properly routed to the tkgs-plugin-server.

When the TLS certificate in tkgs-plugin-tls-secret expires, this communication fails with status code 502 Bad Gateway. The masterproxy-tkgs-plugin logs should report this error:

2023-03-01T11:45:30.846010245Z stderr F 2023/02/08 11:45:30 [error] 8#0: *5167 upstream SSL certificate verify error: (10:certificate has expired) while SSL handshaking to upstream, client: 127.0.0.1, server: localhost, request: "GET /plugin.json HTTP/1.0", upstream: "https://10.96.0.77:8099/plugin.json", host: "127.0.0.1:9900"

The tkgs-plugin-tls-secret should contain an expired TLS certificate:
 
1- Follow KB 90194 to access the supervisor cluster via ssh.

2- Check the expiry date of the TLS certificate for TKG plugin:
# kubectl get secret -n vmware-system-tkg tkgs-plugin-tls-secret -o jsonpath='{.data.tls\.crt}' |base64 -d |openssl x509 -noout -text |grep After
            Not After : Dec 20 11:12:12 2022 GMT <---- expired

Resolution

The certificate is generated as part of the TKG plugin deployment. A force replace for that deployment should help with replacing the certificate:

1- Access the supervisor cluster as instructed above.
2- Go to the directory in which TKGs plugin deployment is stored:
# cd /usr/lib/vmware-wcp/objects/PodVM-GuestCluster/13-tkgs-plugin

3- Force replace the deployment of the TKG plugin using the definition file:
# kubectl replace -f tkgs-plugin-deployment.yaml --force

4- Ensure the certificate is now rotated:
# kubectl get secret -n vmware-system-tkg tkgs-plugin-tls-secret -o jsonpath='{.data.tls\.crt}' |base64 -d |openssl x509 -noout -text |grep After
            Not After : Mar  2 08:46:17 2024 GMT


Additional Information

Impact/Risks:
Cannot set the TKG service configurations exposed to the vCenter through the plugin.