Kubernetes API Requests failing with error "There was an error when trying to connect to the server.\nPlease check the server URL and try again"
search cancel

Kubernetes API Requests failing with error "There was an error when trying to connect to the server.\nPlease check the server URL and try again"

book

Article ID: 400038

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Kubernetes API Requests through kubectl vsphere plugin fail sporadically with the following error:

# kubectl-vsphere login --server <Supervisor_Control_Plane_Node_IP_Address> --vsphere-username <username> --insecure-skip-tls-verify -v 10

DEBU[YYYY-MM-DD HH:MM:SS] User passed verbosity level: 10              
DEBU[YYYY-MM-DD HH:MM:SS] Setting verbosity level: 10                  
DEBU[YYYY-MM-DD HH:MM:SS] Setting request timeout:                     
DEBU[YYYY-MM-DD HH:MM:SS] login called as: kubectl-vsphere login --server <Supervisor_Control_Plane_Node_IP_Address> --vsphere-username <username> --insecure-skip-tls-verify -v 10 
DEBU[YYYY-MM-DD HH:MM:SS] Creating wcp.Client for <Supervisor_Control_Plane_Node_IP_Address>.         
INFO[YYYY-MM-DD HH:MM:SS] Got unexpected HTTP error: Head "https://<Supervisor_Control_Plane_Node_IP_Address>/sdk/vimServiceVersions.xml": EOF 
ERRO[YYYY-MM-DD HH:MM:SS] Error occurred during HTTP request: Get "https://<Supervisor_Control_Plane_Node_IP_Address>/wcp/loginbanner": EOF 
There was an error when trying to connect to the server.\nPlease check the server URL and try again.FATA[YYYY-MM-DD HH:MM:SS] Error while connecting to host <Supervisor_Control_Plane_Node_IP_Address>: Get "https://<Supervisor_Control_Plane_Node_IP_Address>/wcp/loginbanner": EOF. 

  • External services and automation tools (such as argocd, crossplane, etc) are also experiencing issues and are causing additional load on the Kubernetes kube-apiserver (by using frequent API requests, especially long-running 'watch' HTTP requests).

  • When checking logs of kubectl-plugin-vsphere-* pods, the error "512 worker_connections are not enough while connecting to upstream" is observed:

    Retrieve all pods:
    root@SVCP [ ~ ]# k get pods -A | grep plugin-vsphere
    kube-system                kubectl-plugin-vsphere-[SVCP-VM2]           1/1     Running               0      7d
    kube-system               kubectl-plugin-vsphere-[SVCP-VM1]           1/1     Running               0      7d
    kube-system           kubectl-plugin-vsphere-[SVCP-VM3]           1/1     Running               0      7d

    Check logs of said pods:
    root@SVCP [ ~ ]# k logs -n kube-system kubectl-plugin-vsphere-[SVCP-VM2] | grep "worker_connections are not" | tail -n10
    [...]
    YYYY-MM-DD HH:MM:SS [alert] 6#0: *973680 512 worker_connections are not enough while connecting to upstream, client: x.x.x.x, server: default, request: "GET /apis/vmoperator.vmware.com/v1alpha2/namespaces/<vsphere-namespace>/virtualmachines?allowWatchBookmarks=true&resourceVersion=100400720&watch=true HTTP/2.0", upstream: "https://127.0.0.1:6443/apis/vmoperator.vmware.com/v1alpha2/namespaces/<vsphere-namespace>/virtualmachines?allowWatchBookmarks=true&resourceVersion=100400720&watch=true", host: "x.x.x.x"

Environment

  • VMware vSphere Kubernetes Service

Cause

  • The reverse proxy in the kubectl-plugin-vsphere-pods has a default limit of 512 concurrent active HTTP connections. Exceeding this limit results in new connections being blocked and failing.

Resolution

  • The issue is resolved in Supervisor Kubernetes version 1.30 (released with vCenter Server 8.0 Update 3g), which increased the concurrent active HTTP connections from 512 to 1024.

Workaround

The workaround is essentially modifying the config file /etc/vmware/wcp/nginx/nginx.conf on each Supervisor Node in each cluster.
Important: The changes are reverted when this specific node gets re-deployed, like during a rolling Supervisor update! It needs to be reapplied!

    1. Perform backup:
      cp /etc/vmware/wcp/nginx/nginx.conf /etc/vmware/wcp/nginx/nginx.conf.bak2
    2. Modify config /etc/vmware/wcp/nginx/nginx.conf on the Supervisor Node and change this on the top:
      events {
      worker_connections 1024;
      }

      This will increase the limit of 512 concurrent, active connections to 1024.

    3. Restart the kubectl-plugin-vsphere pod and wait for it to restart: (there will be brief API request interruptions)
      crictl stop $(crictl ps -q --name kubectl-plugin-vsphere)
      crictl ps --name kubectl-plugin-vsphere
      crictl logs -f --since 1m $(crictl ps -q --name kubectl-plugin-vsphere)