After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login
search cancel

After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login

book

Article ID: 422737

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When a workload cluster in the same namespace is created under an identical name which has been previously used, the creation will succeed but a login via kubectl does fail.
  • Performing a login via "kubectl vsphere login" will fail with an error despite a correct login:
    # kubectl vsphere login ...
    E1002 11:43:39.772988  922913 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
    error: You must be logged in to the server (the server has asked for the client to provide credentials)
  • Accessing the affected workload cluster via SSH and checking the kube-apiserver pod logs, certificate errors with "kubernetes-extensions" can be observed:
    # kubectl logs -n kube-system kube-apiserver-<tkc>-#####-##### | grep extension | tail -n5
    E1002 ##:##:##.00       1 authentication.go:74] "Unable to authenticate the request" err="[invalid bearer token, Post \"https://localhost:5443/tokenreview?timeout=30s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes-extensions\")]"
  • Verifying the creation timestamps and certificate validity, the respective times do differ. Showing example output with imaginary timestamps:
    # kubectl get cluster -n <namespace> <tkc> -o jsonpath='{.metadata.creationTimestamp}'
    2025-10-10T12:00:00Z <--- when cluster was created

    # kubectl get certificate -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -iE "creationTimestamp|notAfter|notBefore"
      creationTimestamp: "2025-10-10T12:00:00Z"  <--- when tkg-controller created the secret
    notAfter: "2035-09-25T12:00:00Z"
      notBefore: "2025-09-27T12:00:00Z"  <--- when cert-manager generated the secret

    # kubectl get secret -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -i "creationTimestamp"
    creationTimestamp: "2025-09-27T12:00:00Z" <--- when secret was created
    Specific observations:
    1. The workload cluster was created on 2025-10-10.
    2. While respective certificate resource was created on the same day (2025-10-10) too, however its validity has started on 2025-09-27 which was before the cluster was created.
    3. The respective secret was also created before the cluster existence on 2025-09-27.

    Conclusion: The certificate and secret were generated long time before the cluster was initially created. Implying that both resources have already existed and not properly cleaned up during past cluster deletion.

Environment

vSphere Supervisor

vSphere Kubernetes Service 3.5 and earlier

Cause

When a workload cluster is being deleted, the corresponding secret <tkc>-auth-svc-cert on the Supervisor is respectively deleted as well. But as the cert-manager service does not expect its deletion, it re-creates same secret. Due to this secret re-creation, the proper assignment and relationship to the deleted workload cluster is missing. This results in a secret being left in a orphaned state. Eventually, the workload cluster deletion does proceed and succeed.

However, if a new workload cluster is provisioned under the same name, the resource secret/<tkc>-auth-svc-cert does already exist. Hence, it will be re-used. This results in mismatching certificates between Supervisor and the workload cluster and in subsequent failed login attempts.

Resolution

Engineering is aware about this issue and will be addressed in a future release.

Workaround

For a workaround, please refer to the 'Resolution' section in this KB article of similar issue:
https://knowledge.broadcom.com/external/article?articleNumber=385874