TKG management cluster not able to connect to vCenter after vCenter cert or thumbprint change

search cancel

TKG management cluster not able to connect to vCenter after vCenter cert or thumbprint change

book

Article ID: 337406

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:

capv-controller-manager pod logs on the management cluster show similar errors to below:

E1220 17:33:48.452774 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="unexpected error while probing vcenter for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereCluster uploader-prod/uploader-autoscale: Post \"https://VCENTER-FQDN/sdk\": host \"VCENTER-FQDN:443\" thumbprint does not match \"8C:23:7D:32:D7:E5:5B:90:30:54:49:9C:76:EB:1C:37:69:FA:AA:C1\"" "controller"="vspherecluster" "name"="uploader-autoscale" "namespace"="uploader-prod"
You are unable to create or scale any workload clusters

Environment

VMware Tanzu Kubernetes Grid v1.5 to v2.5

Cause

This error occurs due to the vCenter certificate thumbprint changing and not being updated on the TKG management cluster objects and/or in the workload cluster object's metadata.

Resolution

This workaround is not valid for TKG stretch clusters deployed by Telco Cloud Automation. Please contact VMware support for assistance in updating TCA stretch cluster deployments.

For TKGm v2.3 and above, use the "tanzu mc credentials update" command to update the thumbprint in the Management Cluster and its Workload Clusters. See the steps in Update Cluster Credentials for more info.

For TKGm v2.1 and v2.2, use the following procedure.

Update the workload cluster and management cluster data with the following steps. This should not impact the currently running nodes as this just updates the node metadata

Note: Please update the exported yaml file with the new value of thumbprint before replacing the secret. As a best practice verify if the secret is updated with the new thumbprint post replace.

To update the TLS thumbprint on each of the workload cluster:

In each of the commands, make sure to replace the string "WC" with your Workload Cluster name.

In the Management Cluster context, update the {clustername}-vsphere-cpi-addon secret in the management cluster context

# Get the actual secret name for your workload cluster
kubectl get secret -A | grep cpi-addon

# Save the data values information of the secret into a yaml file. Make sure that the secret name here is correct and the same as the actual secret name as retrieved in the above command.
kubectl get secret WC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > WC-vsphere-cpi-addon.yml

# Open the yaml file in your favorite editor and change the thumbprint information.

# Update the secret with the modified yaml file.
kubectl create secret generic WC-vsphere-cpi-addon --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=WC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f -

# Add labels to the secret
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/cluster-name=WC
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/addon-name=vsphere-cpi
In the Workload Cluster context, verify that the secret vsphere-cpi-data-values in tkg-system namespace has been updated. This should have been reconciled after the above secret has been updated.

# The output of this command should show the new thumbprint info kubectl -n tkg-system get secret vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d | grep -i thumbprint
Verify the configmap is updated using the below command on the workload cluster context:

# The output of this command should show the new thumbprint info
kubectl -n kube-system get cm vsphere-cloud-config -o yaml
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted

Note that he procedures above should be performed in each Workload Cluster.

To update the TLS thumbprint on the management cluster:

In each of the following commands, make sure to replace the string "MC" with your Management Cluster name.

In the Management Cluster context, update the secret {management-clustername}-vsphere-cpi-data-values secret in tkg-system namespace

# Get the actual secret name for your cluster
kubectl -n tkg-system get secret | grep vsphere-cpi # Save the data values information of the secret into a yaml file. Make sure that the secret name here is correct and the same as the actual secret name as retrieved in the above command kubectl -n tkg-system get secret MC-vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d > MC-vsphere-cpi-data-values.yml

# Open the yaml file in your favorite editor and change the thumbprint information. # Update the secret with the modified yaml file. kubectl create secret generic MC-vsphere-cpi-data-values -n tkg-system --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=MC-vsphere-cpi-data-values.yml --dry-run=client -o yaml | kubectl replace -f -

# Add labels to the secret
kubectl label secret MC-vsphere-cpi-data-values -n tkg-system tkg.tanzu.vmware.com/cluster-name=MC
kubectl label secret MC-vsphere-cpi-data-values -n tkg-system tkg.tanzu.vmware.com/addon-name=vsphere-cpi

Vsphere TLS Thumbprint also needs to be updated in the "vspherecluster" or "cluster", and "vspherevm" objects. These has to be updated in all clusters.

In the Management Cluster context, list all the vsphereclusters and clusters including the management cluster and note down their names as those will be needed in the next steps.

kubectl get vsphereclusters -A
kubectl get clusters -A
For each of the clusters, edit the vspherecluster OR cluster object and update spec.thumbprint .

If it's a legacy (non-classy) Workload Cluster, then edit the vspherecluster object and update the spec.thumbprint.

kubectl edit vspherecluster WC
Otherwise, if it's a classy Workload Cluster OR a Management Cluster, then edit the cluster object and update the spec.thumbprint.

kubectl edit cluster WC
Verify if the update is completed using the below command:

kubectl get vspherecluster WC -o yaml OR kubectl get cluster WC -o yaml
Restart the vsphere-cloud-controller-manager pod in the kube-system namespace in the Management Cluster.
Scale down the CAPV deployment in the management cluster using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=0
Update the webhook configurations in the management cluster to allow updates to the VSphereVM objects:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'

kubectl patch mutatingwebhookconfiguration capv-mutating-webhook-configuration --patch '{"webhooks": [{"name": "default.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'
For each cluster, edit the VSphereVM objects with the updated thumbprint value with the following command:

- Update the thumbprint on all the VSphereVM objects of the cluster <name-of-cluster> in the namespace <ns-of-cluster>

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> --no-headers=true | awk '{print $1}' | xargs kubectl patch vspherevm -n <ns-of-cluster> --type='merge' --patch '{"spec":{"thumbprint":"<new-thumbprint-value>"}}'

- Confirm the updates on the VSphereVM objects by checking for the thumbprint update on the output of the VSphereVM objects by running the following commands:

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> -oyaml | grep thumbprint
Note that you have to perform the above two commands in each cluster.
Revert the changes to the webhook configurations in the management cluster by running the following commands:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'

kubectl patch mutatingwebhookconfiguration capv-mutating-webhook-configuration --patch '{"webhooks": [{"name": "default.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'
Scale back up the CAPV deployment using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=1

For TKGm v1.5 and v1.6, use the following steps.

Update the workload cluster and management cluster data with the following steps. This should not impact the currently running nodes as this just updates the node metadata

Note: Please update the exported yaml file with the new value of thumbprint before replacing the secret. As a best practice verify if the secret is updated with the new thumbprint post replace.

To update the TLS thumbprint on each of the workload cluster:

In each of the commands, make sure to replace the string "WC" with your Workload Cluster name.

In the Management Cluster context, update the {clustername}-vsphere-cpi-addon secret in the management cluster context

# Get the actual secret name for your workload cluster
kubectl get secret -A | grep cpi-addon

# Save the data values information of the secret into a yaml file. Make sure that the secret name here is correct and the same as the actual secret name as retrieved in the above command.
kubectl get secret WC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > WC-vsphere-cpi-addon.yml

# Open the yaml file in your favorite editor and change the thumbprint information.

# Update the secret with the modified yaml file.
kubectl create secret generic WC-vsphere-cpi-addon --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=WC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f -

# Add labels to the secret
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/cluster-name=WC
kubectl label secret WC-vsphere-cpi-addon tkg.tanzu.vmware.com/addon-name=vsphere-cpi
In the Workload Cluster context, verify that the secret vsphere-cpi-data-values in tkg-system namespace has been updated. This should have been reconciled after the above secret has been updated.

# The output of this command should show the new thumbprint info kubectl -n tkg-system get secret vsphere-cpi-data-values -o jsonpath={.data.values\\.yaml} | base64 -d | grep -i thumbprint
Verify the configmap is updated using the below command on the workload cluster context:

# The output of this command should show the new thumbprint info
kubectl -n kube-system get cm vsphere-cloud-config -o yaml
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted

Note that he procedures above should be performed in each Workload Cluster.

To update the TLS thumbprint on the management cluster:

In each of the following commands, make sure to replace the string "MC" with your Management Cluster name.

In the Management Cluster context, update the {management-clustername}-vsphere-cpi-addon secret in the tkg-system namespace

# Get the actual secret name for your Management Cluster
kubectl -n tkg-system get secret | grep vsphere-cpi

# Save the data values information of the secret into a yaml file. Make sure that the secret name here is correct and the same as the actual secret name as retrieved in the above command.
kubectl -n tkg-system get secret MC-vsphere-cpi-addon -o jsonpath={.data.values\\.yaml} | base64 -d > MC-vsphere-cpi-addon.yml

# Open the yaml file in your favorite editor and change the thumbprint information.

# Update the secret with the modified yaml file.
kubectl create secret generic MC-vsphere-cpi-addon -n tkg-system --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=MC-vsphere-cpi-addon.yml --dry-run=client -o yaml | kubectl replace -f -

# Add labels to the secret
kubectl label secret MC-vsphere-cpi-addon -n tkg-system tkg.tanzu.vmware.com/cluster-name=MC
kubectl label secret MC-vsphere-cpi-addon -n tkg-system tkg.tanzu.vmware.com/addon-name=vsphere-cpi
Verify the configmap is updated using the below command:

kubectl -n kube-system get cm vsphere-cloud-config -o yaml
Restart the vsphere-cloud-controller-manager pod so that the new configmap is mounted

Vsphere TLS Thumbprint also needs to be updated in the "vspherecluster" and "vspherevm" objects. These has to be updated in all clusters.

In the Management Cluster context, list all the vsphereclusters including the management cluster and note down their names as those will be needed in the next steps.

kubectl get vsphereclusters -A
NAMESPACE NAME AGE
default tkg-test 62d
default tkg-wld 83d
tkg-system tkg-mgmt 83d
For each of the clusters, edit the vspherecluster CR and update spec.thumbprint.

kubectl edit vspherecluster WC
Verify if the update is completed using the below command:

kubectl get vspherecluster WC -o yaml
Scale down the CAPV deployment in the management cluster context using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=0
Update the CAPV validating webhook configuration in the management cluster to allow updates to the VSphereVM objects:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Ignore"}]}'
For each cluster, edit the VSphereVM objects with the updated thumbprint value with the following command:

- Update the thumbprint on all the VSphereVM objects of the cluster <name-of-cluster> in the namespace <ns-of-cluster>

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> --no-headers=true | awk '{print $1}' | xargs kubectl patch vspherevm -n <ns-of-cluster> --type='merge' --patch '{"spec":{"thumbprint":"<new-thumbprint-value>"}}'

- Confirm the updates on the VSphereVM objects by checking for the thumbprint update on the output of the VSphereVM objects by running the following commands:

kubectl get vspherevm -l cluster.x-k8s.io/cluster-name=<name-of-cluster> -n <ns-of-cluster> -oyaml | grep thumbprint
Note that you have to perform the above two commands in each cluster.
Revert the changes to the webhook configuration in the management cluster by running the following command:

kubectl patch validatingwebhookconfiguration capv-validating-webhook-configuration --patch '{"webhooks": [{"name": "validation.vspherevm.infrastructure.x-k8s.io", "failurePolicy": "Fail"}]}'
Scale back up the CAPV deployment using the following command:

kubectl scale deploy -n capv-system capv-controller-manager --replicas=1

Feedback

thumb_up Yes

thumb_down No