vCenter task "Download plug-in" keeps failing for 'VMware TKG plugin' with status code 502
search cancel

vCenter task "Download plug-in" keeps failing for 'VMware TKG plugin' with status code 502

book

Article ID: 323416

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service Tanzu Kubernetes Runtime

Issue/Introduction

The article explans the TKG plugin functionality and how to renew its TLS certificate when the download plug-in task fails with a 502 error.

Symptoms:
- Download Plug-in tasks keep failing in the vCenter with this error:

Error downloading plug-in. URL is unreachable. org.apache.http.client.HttpResponseException: status code: 502, reason phrase: Bad Gateway org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70)


- Viewing any of the options in vCenter UI > Cluster > Configure > TKG Service return the below error "502 Bad Gateway":

Environment

vSphere Supervisor

Cause

The TKGS plugin provides visibility and configuration from the vCenter UI under Hosts and Clusters > Cluster > Configure > TKG Service, for setting the Default Tanzu Kubernetes cluster CNI plugin and registering clusters in Tanzu Mission Control. It communicates with tkgs-plugin-server pod in the backend through a masterproxy-tkgs-plugin pod, which acts as a reverse proxy to ensure that calls from that TKG interface in the vSphere client are properly routed to the tkgs-plugin-server.

When the TLS certificate in tkgs-plugin-tls-secret expires, this communication fails with status code 502 Bad Gateway

One or more of the masterproxy-tkgs-plugin pods logs should report the following error:

kubectl get pods -A | grep "tkgs-plugin"

kubectl logs -n <tkgs-plugin-namespace> <masterproxy-tkgs-plugin-pod-name>

YYYY-MM-DDTHH:MM:SS.ssssssZ stderr F YYYY/MM/DD HH:MM:SS [error] 8#0: *5167 upstream SSL certificate verify error: (10: certificate has expired) while SSL handshaking to upstream, client: 127.0.0.1, server: localhost, request: "GET /plugin.json HTTP/1.0", upstream: "https://<tkgs-plugin-service-IP-address>:<tkgs-plugin-port>/plugin.json"

Resolution

The certificate is generated as part of the TKG plugin deployment. A force replace for that deployment should help with replacing the certificate.

Verify that the tkgs-plugin-tls-secret shows an expired TLS certificate: 

  1. Connect into the Supervisor cluster context
  2. Check the expiration date of the TLS certificate for the TKGS plugin:
    kubectl get secret -A | grep "tkgs-plugin-tls-secret"
    
    kubectl get secret -n <plugin namespace> tkgs-plugin-tls-secret -o jsonpath='{.data.tls\.crt}' |base64 -d |openssl x509 -noout -text |grep After
    
    Not After: MON DD HH:MM:SS YYYY GMT
  3. If the above certificate is valid, but the pod logs indicate upstream/certificate expiry errors. It could be probably due to the certificate information not being propagated to the tkgs-plugin pods. Mostly observed after recent certificate changes. In that case, restart the tkgs-plugin pods to resolve the issue. (to restart the pods, you may perform rollout restart deployment/daemonset for tkgs-plugin)
  4. If the above certificate for the TKGS plugin is expired, directly connect into a Supervisor control plane VM if you have not already done so:
    See KB Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VMs
  5. Redeploy the TKGS plugin to force renew its certificate:
    kubectl replace -f /usr/lib/vmware-wcp/objects/PodVM-GuestCluster/13-tkgs-plugin/tkgs-plugin-deployment.yaml --force

    Note: If the tkgs-plugin-deployment.yaml is not found in the above path , run find / -type f -name "*.yaml" 2>/dev/null | grep tkgs-plugin to find the absolute path for the file and perform the renew operation. In certain cases, you need to try this 'find' command from all SPVMs to find the file. 

  6. Confirm that the certificate has renewed successfully:
    kubectl get secret -n <plugin namespace> tkgs-plugin-tls-secret -o jsonpath='{.data.tls\.crt}' |base64 -d |openssl x509 -noout -text |grep After
    
    Not After: MON DD HH:MM:SS YYYY GMT
  7. Check that all tkgs-plugin pods restarted and are in Running state. There should be one masterproxy-tkgs-plugin pod per Supervisor control plane VM and the replica count for tkgs-plugin-server should match its deployment:
    kubectl get pods -A | grep "tkgs-plugin"
    
    masterproxy-tkgs-plugin-<A> 1/1 Running
    masterproxy-tkgs-plugin-<B> 1/1 Running
    masterproxy-tkgs-plugin-<C> 1/1 Running
    tkgs-plugin-server<A-A> 1/1 Running
    tkgs-plugin-server<B-B> 1/1 Running
    
    kubectl get deploy -A | grep "tkgs-plugin-server"
    
    NAME                 READY   UP-TO-DATE   AVAILABLE
    tkgs-plugin-server   2/2     2            2



Additional Information

Impact/Risks:
Cannot set the TKG service configurations exposed to the vCenter through the plugin.

After VKS service Supervisor service is installed, the tkgs-plugin pods are moved to the VKS service namespace beginning with "svc-tkg-domain-c#" which will vary by environment.
 
Prior to the VKS service installation, the tkgs-plugin pods are under the "vmware-system-tkg" namespace.