Unable to run kubectl commands in a Supervisor Cluster due to "error: You must be logged in to the server (Unauthorized)"

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Cannot run kubectl commands on the Supervisor cluster. Kubectl commands are failing with a credential error.

On the vCenter web UI under Workload Management, the below symptoms are observed:

The affected Supervisor cluster is in Error state shows errors similar to the following:

Configured Control Plane VMs
Cluster <cluster id> is unhealthy: Get "http://localhost:1080/external-cert/http1/<supervisor control plane vm IP>/6443/version?timeout=2m0s": context deadline exceeded (Client.Timeout exceeded while awaiting headers).

System error occurred on Master node with identifier <master_node_ID>. Details: Failed to sync changes: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'daemonset', '--namespace', 'vmware-system-logging', '-o', 'json']' returned non-zero exit status1.. Will be retried..

The above error message indicates a failure to run a kubectl command while using the /etc/kubernetes/admin.conf file on the specified Supervisor control plane VM.

Namespaces within the affected Supervisor cluster are stuck in Configuring state with the following error:
```
Failed to reconcile annotations on workload <namespace>
```

When running commands from the Supervisor cluster context or while SSH into a Supervisor control plane VM, the following symptoms are observed:

Any kubectl command returns an error similar to the below:

kubectl get pods -n <namespace>
EMMDD HH:MM:SS.ms #### memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
EMMDD HH:MM:SS.ms #### memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
EMMDD HH:MM:SS.ms #### memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
EMMDD HH:MM:SS.ms #### memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
EMMDD HH:MM:SS.ms #### memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
error: You must be logged in to the server (the server has asked for the client to provide credentials)

Environment

vSphere Supervisor

Cause

Kubectl commands will fail when the /etc/kubernetes/admin.conf file's certificates have expired.

System checks including health checks rely on using this file for running kubectl commands in the Supervisor cluster.

This file is located on each Supervisor control plane VM and will need to be individually checked for expiration.

Resolution

SSH into each Supervisor control plane VM and check if any certificates in the admin.conf file have expired.

#Checks that the admin.conf file's certificates are not expired
cat /etc/kubernetes/admin.conf | grep certificate-authority-data | awk '{print $2}' | base64 -d | openssl x509 -noout -text | grep After

cat /etc/kubernetes/admin.conf | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -noout -text | grep After

Once it has been that confirmed at least one of the above certificates expired, run the below command to renew the certificates:
```
kubeadm certs renew all
```

Confirm that the certificates are now renewed:

cat /etc/kubernetes/admin.conf | grep certificate-authority-data | awk '{print $2}' | base64 -d | openssl x509 -noout -text | grep After

cat /etc/kubernetes/admin.conf | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -noout -text | grep After

To ensure that the renewed certificates are persisted upon reboot, you can run:
```
/usr/lib/vmware-wcp/hypercrypt.py --reencrypt
```

Restart the following system containers to ensure that these system processes use the renewed certificates:

crictl rm -f $(crictl ps --label io.kubernetes.container.name=kube-controller-manager -q) 
crictl rm -f $(crictl ps --label io.kubernetes.container.name=kube-scheduler -q) 
crictl rm -f $(crictl ps --label io.kubernetes.container.name=etcd -q) 
crictl rm -f $(crictl ps --label io.kubernetes.container.name=kube-apiserver -q)

crictl ps | egrep "kube-|etcd"

The steps above only renew a portion of the expired Kubernetes certificates.
After the above steps have been performed on all Supervisor control plane VMs, the certmgr script must be run on the Supervisor cluster again.
See the following KB regarding downloading and using the certmgr script. Ensure that you are using the latest certmgr script.
Replace vSphere with Tanzu Supervisor Certificates