After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login

search cancel

After deleting and re-creating workload cluster with same name, accessing via "kubectl vsphere login" results in error "You must be logged in to the server" even with correct login

book

Article ID: 422737

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When a workload cluster in the same namespace is created under an identical name which has been previously used, the creation will succeed but a login via kubectl does fail.

Performing a login via "kubectl vsphere login" will fail with an error despite a correct login:

# kubectl vsphere login ...
E1002 11:43:39.772988  922913 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
error: You must be logged in to the server (the server has asked for the client to provide credentials)

Accessing the affected workload cluster via SSH and checking the kube-apiserver pod logs, certificate errors with "kubernetes-extensions" can be observed:

# kubectl logs -n kube-system kube-apiserver-<tkc>-#####-##### | grep extension | tail -n5
E1002 ##:##:##.00       1 authentication.go:74] "Unable to authenticate the request" err="[invalid bearer token, Post \"https://localhost:5443/tokenreview?timeout=30s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes-extensions\")]"

Verifying the creation timestamps and certificate validity, the respective times do differ. Showing example output with imaginary timestamps:
```
# kubectl get cluster -n <namespace> <tkc> -o jsonpath='{.metadata.creationTimestamp}'
2025-10-10T12:00:00Z  <--- when cluster was created

# kubectl get certificate -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -iE "creationTimestamp|notAfter|notBefore"
  creationTimestamp: "2025-10-10T12:00:00Z"  <--- when tkg-controller created the secret
  notAfter: "2035-09-25T12:00:00Z"
  notBefore: "2025-09-27T12:00:00Z"  <--- when cert-manager generated the secret

# kubectl get secret -n <namespace> <tkc>-auth-svc-cert -o yaml | grep -i "creationTimestamp"
  creationTimestamp: "2025-09-27T12:00:00Z"  <--- when secret was created
```
Specific observations:
1. The workload cluster was created on 2025-10-10.
2. While respective certificate resource was created on the same day (2025-10-10) too, however its validity has started on 2025-09-27 which was before the cluster was created.
3. The respective secret was also created before the cluster existence on 2025-09-27.

Conclusion: The certificate and secret were generated long time before the cluster was initially created. Implying that both resources have already existed and not properly cleaned up during past cluster deletion.

Environment

vSphere Supervisor

vSphere Kubernetes Service 3.5 and earlier

Cause

When a workload cluster is being deleted, the corresponding secret <tkc>-auth-svc-cert on the Supervisor is respectively deleted as well. But as the cert-manager service does not expect its deletion, it re-creates same secret. Due to this secret re-creation, the proper assignment and relationship to the deleted workload cluster is missing. This results in a secret being left in a orphaned state. Eventually, the workload cluster deletion does proceed and succeed.

However, if a new workload cluster is provisioned under the same name, the resource secret/<tkc>-auth-svc-cert does already exist. Hence, it will be re-used. This results in mismatching certificates between Supervisor and the workload cluster and in subsequent failed login attempts.

Resolution

Engineering is aware about this issue and will be addressed in a future release.

Workaround

For a workaround, please reach out to Broadcom Support with reference to this KB article. The workaround involves applying several, possibly invasive steps and hence should only be carried out with or after reaching out to Broadcom Support.

Additional Information

Japanese version: 同じ名前のワークロードクラスターを削除して再作成した後、「kubectl vsphere login」経由でアクセスすると、正しくログインしても「サーバーにログインしている必要があります」というエラーが発生します。

Feedback

thumb_up Yes

thumb_down No