After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login
search cancel

After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login

book

Article ID: 422737

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When a workload cluster in the same namespace is created under an identical name which has been previously used, the creation will succeed but a login via kubectl does fail.
  • Performing a login via "kubectl vsphere login" will fail with an error despite a correct login:
    # kubectl vsphere login ...
    <Timestamp>  922913 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
    error: You must be logged in to the server (the server has asked for the client to provide credentials)
  • Accessing the affected workload cluster via SSH and checking the kube-apiserver pod logs, certificate errors with "kubernetes-extensions" can be observed:
    # kubectl logs -n kube-system kube-apiserver-<tkc>-#####-##### | grep extension | tail -n5
    E1002 ##:##:##.00       1 authentication.go:74] "Unable to authenticate the request" err="[invalid bearer token, Post \"https://localhost:5443/tokenreview?timeout=30s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes-extensions\")]"
  • Verifying the creation timestamps and certificate validity, the respective times do differ. Showing example output with imaginary timestamps:
    # kubectl get cluster -n <namespace> <tkc> -o jsonpath='{.metadata.creationTimestamp}'
    <Timestamp of cluster creation>

    # kubectl get certificate -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -iE "creationTimestamp|notAfter|notBefore"
    creationTimestamp: "<Timestamp for when tkg-controller created the secret>"
    notAfter: "<certificate expiration timestamp>"
    notBefore: "<Timestamp when cert-manager generated the secret>"

    # kubectl get secret -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -i "creationTimestamp"
    creationTimestamp: "<Timestamp when secret was created>"
    Specific observations:
    1. The current workload cluster and its associated certificate resource share the same creation date.
    2. However, a review of the certificate shows its validity period started before the current cluster was created.
    3. Similarly, the corresponding secret's creation date predates the existence of the current cluster.

    Conclusion: The certificate and secret were generated long time before the cluster was initially created. Implying that both resources have already existed and not properly cleaned up during past cluster deletion.

Environment

vSphere Kubernetes Service 3.5 and earlier

Cause

  • When a workload cluster is being deleted, the corresponding secret <tkc>-auth-svc-cert on the Supervisor is respectively deleted as well. But as the cert-manager service does not expect its deletion, it re-creates same secret. Due to this secret re-creation, the proper assignment and relationship to the deleted workload cluster is missing. This results in a secret being left in a orphaned state. Eventually, the workload cluster deletion does proceed and succeed.
  • However, if a new workload cluster is provisioned under the same name, the resource secret/<tkc>-auth-svc-cert does already exist. Hence, it will be re-used. This results in mismatching certificates between Supervisor and the workload cluster and in subsequent failed login attempts.

Resolution

Issue is fixed in vSphere Kubernetes Service 3.6 version. Please refer to the following Release Notes.

Workaround

For a workaround, please reach out to Broadcom Support with reference to this KB article. The workaround involves applying several, possibly invasive steps and hence should only be carried out with or after reaching out to Broadcom Support.

Additional Information

Japanese version: 同じ名前のワークロード クラスターを削除して再作成した後、「kubectl vsphere login」経由でアクセスすると、正しくログインしても「サーバーにログインしている必要があります」というエラーが発生します。