Getting nginx error- "502 Bad Gateway" when navigating to Workload Management > Namespace > Resources tab.
search cancel

Getting nginx error- "502 Bad Gateway" when navigating to Workload Management > Namespace > Resources tab.

book

Article ID: 408063

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When navigating to "Workload Management > Namespace > "select a namespace" > Resources Tab, you get the error message "502 Bad Gateway". A relevant screenshot can be seen below.

  • When performing a curl from outside of the Supervisor Control Plane VMs', using the Floating IP, it returns a "502 Bad Gateway" instead of the JSON output. The nginx on port 443 of the master node has a pre-existing forward towards /appplatform[0-9].

    curl -k -i https://<IP>/appplatform1/plugin.json

    HTTP/1.1 200 Connection established

    HTTP/1.1 502 Bad Gateway
    Server: nginx/1.22.0
    Date: <day<, <date> <time> <timezone>
    Content-Type: text/html
    Content-Length: 157
    Connection: keep-alive

    <html>
    <head><title>502 Bad Gateway</title></head>
    <body>
    <center><h1>502 Bad Gateway</h1></center>
    <hr><center>nginx/1.22.0/center>
    </body>
    </html>

  • Per the nginx logs for the masterproxy-cci-ns-plugin, you see that it is complaining of an "expired certificate" which limits its ability to perform a successful GET request towards the plugin.json over port 8053. The relevant log snippets can be found below. 

    [error] 6#0: *10749 upstream SSL certificate verify error: (10:certificate has expired) while SSL handshaking to upstream, client: 127.0.0.1, server: localhost, request: "GET /plugin.json HTTP/1.0", upstream: "https://<IP>:8053/plugin.json", host: "127.0.0.1:9901"
    [error] 6#0: *10751 upstream SSL certificate verify error: (10:certificate has expired) while SSL handshaking to upstream, client: 127.0.0.1, server: localhost, request: "GET /plugin.json HTTP/1.0", upstream: "https://<IP>:8053/plugin.json", host: "127.0.0.1:9901"
    [error] 6#0: *10753 upstream SSL certificate verify error: (10:certificate has expired) while SSL handshaking to upstream, client: 127.0.0.1, server: localhost, request: "GET /plugin.json HTTP/1.0", upstream: "https://<IP>:8053/plugin.json", host: "127.0.0.1:9901"
  • The <IP> here is the service endpoint (cluster-IP) for the cci-ns-plugin. You can confirm the presence of the same by using the below command. The expected output can also be seen below.

     kubectl get vcuiplugins.appplatform.wcp.vmware.com -A

    NAMESPACE                        NAME            AGE
    svc-cci-service-domain-c<ID>   cci-ns-plugin   377d
    svc-tkg-domain-c<ID>           tkgs-plugin     377d

  • When performing a curl towards the cci-ns-plugin service endpoint, it is observed that the certificate for the same is the one which is expired. The issue of the certificate is the vCenter server itself.

    curl -v -k https://<IP>:8053/plugin.json

    *   Trying <IP>:8053...
    * Connected to <IP> (<ip>) port 8053 (#0)
    * ALPN: offers http/1.1
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
    * TLSv1.3 (IN), TLS handshake, Certificate (11):
    * TLSv1.3 (IN), TLS handshake, CERT verify (15):
    * TLSv1.3 (IN), TLS handshake, Finished (20):
    * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.3 (OUT), TLS handshake, Finished (20):
    * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
    * ALPN: server accepted http/1.1
    * Server certificate:
    *  subject: CN=<IP>; O=VMware; C=US
    *  start date: <date and time>
    *  expire date: <date and time>
    *  issuer: CN=<vCenter server FQDN>; DC=vsphere; DC=local; C=US; ST=California; O=<vCenter server FQDN>

Environment

VMware vSphere Kubernetes Service

Cause

The service endpoint certificate for the Cloud Consumption Interface (or Local Consumption Interface for version 9.0.1 and later) plugin has expired. Any issues with the same results in the interface not starting correctly when looking at the Resources tab for a namespace.

Resolution

Currently we have the below workaround to regenerate/renew the certificate.

  • Navigate to Workload Management > Services.
  • Under the CCI/LCI service, navigate to Actions > Manage Versions > select your version and click "Deactivate". Post this use the "Deactivate the entire service" option to fully deactivate the CCI/LCI service.
  • Once the service is successfully deactivated, proceed with deleting the service by navigating to Actions > Delete.

    Once the service is deleted, proceed with adding the CCI/LCI service back. The same can be done by following the below steps.

  • Under Workload Management > Services, navigate to the symlink- "Discover and download available Supervisor Services here". Follow the instructions in the same to download the CCI/LCI service from Broadcom Support Portal.
  • Once the same is downloaded, navigate to Workload Management > Services > Add New Service. Upload the service downloaded using the instructions in the previous step and click on "Finish". You should see the new CCI/LCI service now.
  • Now under the newly created service, navigate to Actions > Manage service. Select the new service under "Install version" and select the supervisor instance. Click on "Review" and then "Finish".
  • Once the service is re-initialized, you should see a new VCUI plugin for cci-ns-plugin in the supervisor cluster. 
  • When performing a curl towards the new cci-ns-plugin service endpoint, you see that the certificate for the same is renewed/regenerated. 

Post completing the steps above, on navigating to "Workload Management > Namespace > "select a namespace" > Resources Tab, you should no longer see the error message "502 Bad Gateway".



Additional Information

if the above does not resolve this ensure that the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist file is populated and correct on all control planes as per
https://knowledge.broadcom.com/external/article/381404/after-vc-upgrade-to-80u3-build-24262322.html and above and then follow the above workaround

To read more about the vSphere Supervisor services, refer the following link- vSphere Supervisor Services