Management cluster not able to connect after cert or thumbprint change

search cancel

Management cluster not able to connect after cert or thumbprint change

book

Article ID: 337406

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Tanzu Kubernetes Grid

Issue/Introduction

The management cluster is not able to connect to vCenter after a vCenter cert or thumbprint change
capv-controller-manager pod logs on the management cluster show similar errors to below:

E1220 17:33:48.452774 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="unexpected error while probing vcenter for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereCluster uploader-prod/uploader-autoscale: Post \"https://VCENTER-FQDN/sdk\": host \"VCENTER-FQDN:443\" thumbprint does not match \"<THUMBPRINT>\"" "controller"="vspherecluster" "name"="uploader-autoscale" "namespace"="uploader-prod"
You are unable to create or scale any workload clusters
If using an auto-scaler, the node will continue to attempt to provision and recreate based on settings in the cluster

If using Calico, you will see errors similar:
2025-04-02 22:29:09.930 [WARNING][85] felix/ipip_mgr.go 111: Failed to add IPIP tunnel device error=exit status 1
2025-04-02 22:29:09.930 [WARNING][85] felix/ipip_mgr.go 88: Failed configure IPIP tunnel device, retrying... error=exit status

Environment

VMware Tanzu Kubernetes Grid v1.5 to v2.x

Cause

This error occurs when the vCenter certificate thumbprint changes but is not updated in the TKG management cluster objects and/or in the workload cluster object's metadata.

Resolution

Confirm if there is a mismatch.

In the Management Cluster context, update the {clustername}-vsphere-cpi-addon secret in the management cluster context
Get the actual secret name for your workload cluster
kubectl get secret -A | grep cpi-addon
Save the data values information of the secret into a yaml file. Make sure the secret name here matches the actual secret name returned by the command above.
kubectl get secret WC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > WC-vsphere-cpi-addon.yaml
Open the yaml file and confirm that they match.
Confirm that the labels are there and match.
kubectl get secret WC-vsphere-cpi-addon | grep label
(this should look something like tkg.tanzu.vmware.com/cluster-name=WC and tkg.tanzu.vmware.com/addon-name=vsphere-cpi)
Verify in the Workload Cluster (The output of this command should show the thumbprint info)
kubectl -n tkg-system get secret vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d | grep -i thumbprint
The output of this command should also show the thumbprint information. Compare these, and we will double-check wethat are seeing the correct thumbprint in both locations
kubectl -n kube-system get cm vsphere-cloud-config -o yaml

This workaround is not valid for TKG stretch clusters deployed by Telco Cloud Automation. Please contact VMware support for assistance with updating TCA stretch cluster deployments.

For TKGm v2.3 and above

Use the "tanzu mc credentials update" command to update the thumbprint in the Management Cluster and its Workload Clusters. See the steps in Update Cluster Credentials for more info.

For TKGm v2.1 and v2.2

Update the workload cluster and management cluster data by following these steps. This should not impact the currently running nodes, as this updates the node metadata

Note: Please update the exported yaml file with the new value of thumbprint before replacing the secret. As a best practice, verify that the secret has been updated with the new thumbprint after the replacement.

To update the TLS thumbprint on each of the workload clusters:

In each of the commands, make sure to replace the string "WC" with your Workload Cluster name.

In the Management Cluster context, update the {clustername}-vsphere-cpi-addon secret in the management cluster context\
Get the actual secret name for your workload cluster
kubectl get secret -A | grep cpi-addon
Save the data values information of the secret into a yaml file. Make sure the secret name here matches the actual secret name returned by the command above.
kubectl get secret WC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > WC-vsphere-cpi-addon.yml
Update the secret with the modified yaml file.
kubectl create secret generic WC-vsphere-cpi-addon --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=WC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f -
Add labels to the secret
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/cluster-name=WC.
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/addon-name=vsphere-cpi
In the Workload Cluster context, verify that the secret vsphere-cpi-data-values in the tkg-system namespace has been updated. This should have been reconciled after the above secret was updated.
The output of this command should show the new thumbprint info
kubectl -n tkg-system get secret vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d | grep -i thumbprint.
Verify the configmap is updated using the below command on the workload cluster context:
The output of this command should show the new thumbprint info
kubectl -n kube-system get cm vsphere-cloud-config -o yaml.
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted.

Note that the procedures above should be performed in each Workload Cluster.

To update the TLS thumbprint on the management cluster:

In each of the following commands, make sure to replace the string "MC" with your Management Cluster name.

In the Management Cluster context, update the secret {management-clustername}-vsphere-cpi-data-values secret in the g-system namespace.
.
Get the actual secret name for your cluster.
kubectl -n tkg-system get secret | grep vsphere-cpi
Save the data values information of the secret into a yaml file. Make sure that the secret name here is correct and the same as the actual secret name as retrieved in the above command
kubectl -n tkg-system get secret MC-vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d > MC-vsphere-cpi-data-values.yml.
Open the yaml file in your favorite editor and change the thumbprint information.
Update the secret with the modified yaml file.
kubectl create secret generic MC-vsphere-cpi-data-values -n tkg-system --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=MC-vsphere-cpi-data-values.yml --dry-run=client -o yaml | kubectl replace -f -
Add labels to the secret
kubectl label secret MC-vsphere-cpi-data-values -n tkg-system tkg.tanzu.vmware.com/cluster-name=MC.
kubectl label secret MC-vsphere-cpi-data-values -n tkg-system tkg.tanzu.vmware.com/addon-name=vsphere-cpi

The vSphere TLS Thumbprint also needs to be updated in the "vspherecluster" or "cluster" and "vspherevm" objects. These must be updated across all clusters.

In the Management Cluster context, list all clusters and their clusters, including the management cluster, and note the names as needed for the next steps.
kubectl get vsphereclusters -A
kubectl get clusters -A
For each of the clusters, edit the vspherecluster OR cluster object and update spec.thumbprint
If it's a legacy (non-classy) Workload Cluster, then edit the vspherecluster object and update the spec.thumbprint.

kubectl edit vspherecluster WC

Otherwise, if it's a classy Workload Cluster OR a Management Cluster, then edit the cluster object and update the spec.thumbprint.

kubectl edit cluster WC
Verify if the update completes using the command below:
kubectl get vspherecluster WC -o yaml

OR

kubectl get cluster WC -o yaml
Restart the vsphere-cloud-controller-manager pod in the kube-system namespace in the Management Cluster.
Scale down the CAPV deployment in the management cluster using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=0
Update the webhook configurations in the management cluster to allow updates to the VSphereVM objects:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'

kubectl patch mutatingwebhookconfiguration capv-mutating-webhook-configuration --patch '{"webhooks": [{"name": "default.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'
For each cluster, edit the VSphereVM objects with the updated thumbprint value with the following command:

- Update the thumbprint on all the VSphereVM objects of the cluster <name-of-cluster> in the namespace <ns-of-cluster>

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> --no-headers=true | awk '{print $1}' | xargs kubectl patch vspherevm -n <ns-of-cluster> --type='merge' --patch '{"spec":{"thumbprint":"<new-thumbprint-value>"}}'

- Confirm the updates on the VSphereVM objects by checking for the thumbprint update on the output of the VSphereVM objects by running the following commands:

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> -oyaml | grep thumbprint
Note that you have to perform the above two commands in each cluster.
Revert the changes to the webhook configurations in the management cluster by running the following commands:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'

kubectl patch mutatingwebhookconfiguration capv-mutating-webhook-configuration --patch '{"webhooks": [{"name": "default.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'
Scale back up the CAPV deployment using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=1

For TKGm v1.5 and v1.6

Update the workload cluster and management cluster data by following these steps. This should not impact the currently running nodes, as this updates the node metadata

.
Note: Please update the exported yaml file with the new value of thumbprint before replacing the secret. As a best practice, verify that the secret has been updated with the new thumbprint after the replacement.

To update the TLS thumbprint on each of the workload clusters:

In each of the commands, make sure to replace the string "WC" with your Workload Cluster name.

In the Management Cluster context, update the {clustername}-vsphere-cpi-addon secret in the management cluster context.
Get the actual secret name for your workload cluster
kubectl get secret -A | grep cpi-addon
Save the data values information of the secret into a yaml file. Make sure the secret name here matches the actual secret name retrieved by the command above.
kubectl get secret WC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > WC-vsphere-cpi-addon.yml
Open the yaml file in your favorite editor and change the thumbprint information.
Update the secret with the modified yaml file.
kubectl create secret generic WC-vsphere-cpi-addon --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=WC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f
Add labels to the secret.
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/cluster-name=WC
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/addon-name=vsphere-cpi
In the Workload Cluster context, verify that the secret vsphere-cpi-data-values in the tkg-system namespace has been updated. It should have been reconciled after the above secret was updated.
The output of this command should show the new thumbprint inf.o
kubectl -n tkg-system get secret vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d | grep -i thumbprint
Verify the configmap is updated using the below command on the workload cluster context:
The output of this command should show the new thumbprint info
kubectl -n kube-system get cm vsphere-cloud-config -o yaml
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted.

Note that the procedures above should be performed in each Workload Cluster.

To update the TLS thumbprint on the management cluster:

In each of the following commands, make sure to replace the string "MC" with your Management Cluster name.

In the Management Cluster context, update the {management-clustername}-vsphere-cpi-addon secret in the tkg-system namespace
.
Get the actual secret name for your Management Cluster.r
kubectl -n tkg-system get secret | grep vsphere-cpi

Save the data values information of the secret into a yaml file. Make sure the secret name here matches the actual secret name retrieved by the command above.
kubectl -n tkg-system get secret MC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > MC-vsphere-cpi-addon.yml
Open the yaml file in your favorite editor and change the thumbprint information.
Update the secret with the modified yaml file.
kubectl create secret generic MC-vsphere-cpi-addon -n tkg-system --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=MC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f -
Add labels to the secret.
kubectl label secret MC-vsphere-cpi-addon -n tkg-system tkg.tanzu.vmware.com/cluster-name=MC
kubectl label secret MC-vsphere-cpi-addon -n tkg-system tkg.tanzu.vmware.com/addon-name=vsphere-cpi
Verify the configmap is updated using the below command:
kubectl -n kube-system get cm vsphere-cloud-config -o yaml
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted

Vsphere TLS Thumbprint also needs to be updated in the "vspherecluster" and "vspherevm" objects. This must be updated across all clusters.

In the Management Cluster context, list all vsphereclusters, including the management cluster, and record their names, as they will be needed in the next steps.

kubectl get vsphereclusters -A
NAMESPACE NAME AGE
default tkg-test 62d
default tkg-wld 83d
tkg-system tkg-mgmt 83d
For each of the clusters, edit the vspherecluster CR and update spec.thumbprint.

kubectl edit vspherecluster WC
Verify if the update is completed using the command below:

kubectl get vspherecluster WC -o yaml
Scale down the CAPV deployment in the management cluster context using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=0
Update the CAPV validating webhook configuration in the management cluster to allow updates to the VSphereVM objects:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'
For each cluster, edit the VSphereVM objects with the updated thumbprint value with the following command:
Update the thumbprint on all the VSphereVM objects of the cluster <name-of-cluster> in the namespace <ns-of-cluster>

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> --no-headers=true | awk '{print $1}' | xargs kubectl patch vspherevm -n <ns-of-cluster> --type='merge' --patch '{"spec":{"thumbprint":"<new-thumbprint-value>"}}'
Confirm the updates on the VSphereVM objects by checking for the thumbprint update on the output of the VSphereVM objects by running the following commands:

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> -oyaml | grep thumbprint
Note that you have to perform the above two commands in each cluster.
Revert the changes to the webhook configuration in the management cluster by running the following command:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'
Scale back up the CAPV deployment using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=1

Feedback

thumb_up Yes

thumb_down No