vSphere TLS thumbprint update using tanzu credentials plugin doesn't update all the cluster objects in Tanzu Kubernetes Grid Management 2.3.x and above
search cancel

vSphere TLS thumbprint update using tanzu credentials plugin doesn't update all the cluster objects in Tanzu Kubernetes Grid Management 2.3.x and above

book

Article ID: 373844

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

After updating the vSphere TLS thumbprint using tanzu credentials plugin as mentioned in the doc  https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/2.3/using-tkg/workload-clusters-secret.html you would notice vsphere-csi-controller pods in crashloop state

Environment

TKGm 2.3.x and above

Cause

Updating the thumbprint doesn't update vspherecsiconfig CR for the classy cluster due to which the csi pods doesn't come up

Resolution

  • Check the vsphere thumbprint that's present before the update:
========================
[root@CentOS7TestVM ~]#
kubectl get vspherecluster -A  -o yaml | grep thumbprint
          f:thumbprint: {}
    thumbprint: AC:70:FE:AF:24:70:C7:7E:0B:A7:3E:46:AE:96:83:B6:C4:48:1C:A0
        {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"VSphereCluster","metadata":{"annotations":{},"name":"wld","namespace":"default"},"spec":{"identityRef":{"kind":"Secret","name":"wld"},"server":"192.168.10.79","thumbprint":"AC:70:FE:AF:24:70:C7:7E:0B:A7:3E:46:AE:96:83:B6:C4:48:1C:A0"}}
          f:thumbprint: {}
    thumbprint: AC:70:FE:AF:24:70:C7:7E:0B:A7:3E:46:AE:96:83:B6:C4:48:1C:A0
          f:thumbprint: {}
    thumbprint: AC:70:FE:AF:24:70:C7:7E:0B:A7:3E:46:AE:96:83:B6:C4:48:1C:A0

  • After rotating the vCenter certificates you will notice the vsphere-csi-controlller pods crashloooping
kubectl  -n vmware-system-csi logs -l app=vsphere-csi-controller -c vsphere-csi-controller

{"level":"error","time":"2024-08-01T06:35:04.712596664Z","caller":"vsphere/virtualcenter.go:647","msg":"failed to connect to VirtualCenter host: \"192.168.10.79\". Err: Post \"https://192.168.10.79:443/sdk\": host \"192.168.10.79:443\" thumbprint does not match \"AC:70:FE:AF:24:70:C7:7E:0B:A7:3E:46:AE:96:83:B6:C4:48:1C:A0\"","TraceId":"ed223b05-6391-4360-bdea-6aa740a3d6ef","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/mts/release/bora-22280805/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/common/cns-lib/vsphere/virtualcenter.go:647\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/mts/release/bora-22280805/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/vanilla/controller.go:235\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/mts/release/bora-22280805/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/mts/release/bora-22280805/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/driver.go:202\nmain.main\n\t/build/mts/release/bora-22280805/cayman_vsphere_csi_driver/vsphere_csi_driver/src/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/build/mts/release/bora-22280805/compcache/cayman_go/ob-21619204/linux64/src/runtime/proc.go:250"}

 

  • Also csi pkgi will be in Reconcile Failed state
======
kubectl -n tkg-system get apps mgmt-avi-vsphere-csi
NAME                   DESCRIPTION                                                                       SINCE-DEPLOY   AGE
mgmt-avi-vsphere-csi   Reconcile failed: Deploying: Error (see .status.usefulErrorMessage for details)   57s            5h42m
======

You can see the same status for workload clusters.

Updating the Credentials using the tanzu cli:
==================================
tanzu mc credentials update --vsphere-thumbprint 55:94:20:8A:D5:6F:EC:13:82:B3:C7:E6:32:27:80:1A:51:9A:0F:98 --cascading -v 9
compatibility file (/root/.config/tanzu/tkg/compatibility/tkg-compatibility.yaml) already exists, skipping download
BOM files inside /root/.config/tanzu/tkg/bom already exists, skipping download
Updating credentials for management cluster "mgmt-avi"
Updating tlsThumbprint for cluster "mgmt-avi"
Applying patch to resource mgmt-avi of type *v1beta1.Cluster ...
Updating credentials for all workload clusters under management cluster "mgmt-avi"
Updating credentials for workload cluster "testcluster" ...
Updating tlsThumbprint for cluster "testcluster"
Applying patch to resource testcluster of type *v1beta1.Cluster ...
Updating credentials for workload cluster "wld" ...
Updating tlsThumbprint for cluster "wld"
Applying patch to resource wld of type *v1beta1.VSphereCluster ...
waiting for resource wld-vsphere-cpi-addon of type *v1.Secret to be up and running
Patching vsphere cpi config credential secret
Applying patch to resource wld-vsphere-cpi-addon of type *v1.Secret ...
waiting for resource wld-vsphere-csi-addon of type *v1.Secret to be up and running
Patching vsphere csi config credential secret
Applying patch to resource wld-vsphere-csi-addon of type *v1.Secret ...
Credentials for management cluster is being updated
  • For the Legacy workload cluster you need to delete the vsphere-csi pkgi which will get recreated and the nodes will be reregistered and upon which csinode and csinodetopologies objects will be created with the correct thumbprint.

On the Legacy workload cluster
============
[root@CentOS7TestVM ~]# kubectl get pkgi -A
NAMESPACE    NAME                                PACKAGE NAME                                         PACKAGE VERSION                  DESCRIPTION                                                            AGE
tkg-system   antrea                              antrea.tanzu.vmware.com                              1.11.2+vmware.1-tkg.1-advanced   Reconcile succeeded                                                    6h58m
tkg-system   load-balancer-and-ingress-service   load-balancer-and-ingress-service.tanzu.vmware.com   1.10.2+vmware.1-tkg.1            Reconcile succeeded                                                    6h58m
tkg-system   metrics-server                      metrics-server.tanzu.vmware.com                      0.6.2+vmware.1-tkg.4             Reconcile succeeded                                                    6h58m
tkg-system   secretgen-controller                secretgen-controller.tanzu.vmware.com                0.14.2+vmware.2-tkg.3            Reconcile succeeded                                                    6h58m
tkg-system   vsphere-cpi                         vsphere-cpi.tanzu.vmware.com                         1.27.0+vmware.1-tkg.1            Reconcile succeeded                                                    6h58m
tkg-system   vsphere-csi                         vsphere-csi.tanzu.vmware.com                         3.0.2+vmware.2-tkg.1             Reconcile failed: Error (see .status.usefulErrorMessage for details)   6h58m
[root@CentOS7TestVM ~]#
[root@CentOS7TestVM ~]# kubectl delete pkgi vsphere-csi -n tkg-system
packageinstall.packaging.carvel.dev "vsphere-csi" deleted
[root@CentOS7TestVM ~]# kubectl get pkgi -A
NAMESPACE    NAME                                PACKAGE NAME                                         PACKAGE VERSION                  DESCRIPTION           AGE
tkg-system   antrea                              antrea.tanzu.vmware.com                              1.11.2+vmware.1-tkg.1-advanced   Reconcile succeeded   7h1m
tkg-system   load-balancer-and-ingress-service   load-balancer-and-ingress-service.tanzu.vmware.com   1.10.2+vmware.1-tkg.1            Reconcile succeeded   7h1m
tkg-system   metrics-server                      metrics-server.tanzu.vmware.com                      0.6.2+vmware.1-tkg.4             Reconcile succeeded   7h1m
tkg-system   secretgen-controller                secretgen-controller.tanzu.vmware.com                0.14.2+vmware.2-tkg.3            Reconcile succeeded   7h1m
tkg-system   vsphere-cpi                         vsphere-cpi.tanzu.vmware.com                         1.27.0+vmware.1-tkg.1            Reconcile succeeded   7h1m
tkg-system   vsphere-csi                         vsphere-csi.tanzu.vmware.com                         3.0.2+vmware.2-tkg.1             Reconcile succeeded   117s
[root@CentOS7TestVM ~]#
[root@CentOS7TestVM ~]#

[root@CentOS7TestVM ~]# kubectl -n vmware-system-csi get po
NAME                                      READY   STATUS    RESTARTS   AGE
vsphere-csi-controller-557755f9b9-dnqxq   7/7     Running   0          2m18s
vsphere-csi-node-lc9k2                    3/3     Running   0          2m18s
vsphere-csi-node-x6jss                    3/3     Running   0          2m18s
[root@CentOS7TestVM ~]#

 

  • For the Classy cluster there is one additional step wherein we need to update the vspherecsiconfig CR with the new thumbprint.

    Set the context to the Management cluster and run the below commands:

kubectl edit vspherecsiconfigs.csi.tanzu.vmware.com -n tkg-system mgmt-avi
vspherecsiconfig.csi.tanzu.vmware.com/mgmt-avi edited
kubectl edit vspherecsiconfigs.csi.tanzu.vmware.com testcluster
vspherecsiconfig.csi.tanzu.vmware.com/testcluster edited

 

  • In the above snippet, we are editing for both the Management cluster and workload cluster named testcluster which is a classy cluster.

    Once the vspherecsiconfig object is updated you can then proceed with deleting the vsphere-csi pkgi for the respective clusters

On the Classy workload cluster
====================

[root@CentOS7TestVM ~]# kubectl get pkgi -A
NAMESPACE    NAME                                            PACKAGE NAME                                         PACKAGE VERSION                  DESCRIPTION                                                            AGE
tkg-system   testcluster-antrea                              antrea.tanzu.vmware.com                              1.11.2+vmware.1-tkg.1-advanced   Reconcile succeeded                                                    6h8m
tkg-system   testcluster-capabilities                        capabilities.tanzu.vmware.com                        0.31.0+vmware.1                  Reconcile succeeded                                                    6h8m
tkg-system   testcluster-load-balancer-and-ingress-service   load-balancer-and-ingress-service.tanzu.vmware.com   1.10.2+vmware.1-tkg.1            Reconcile succeeded                                                    6h8m
tkg-system   testcluster-metrics-server                      metrics-server.tanzu.vmware.com                      0.6.2+vmware.1-tkg.4             Reconcile succeeded                                                    6h8m
tkg-system   testcluster-pinniped                            pinniped.tanzu.vmware.com                            0.24.0+vmware.1-tkg.2            Reconcile succeeded                                                    6h8m
tkg-system   testcluster-secretgen-controller                secretgen-controller.tanzu.vmware.com                0.14.2+vmware.2-tkg.3            Reconcile succeeded                                                    6h8m
tkg-system   testcluster-tkg-storageclass                    tkg-storageclass.tanzu.vmware.com                    0.31.0+vmware.1                  Reconcile succeeded                                                    6h8m
tkg-system   testcluster-vsphere-cpi                         vsphere-cpi.tanzu.vmware.com                         1.27.0+vmware.1-tkg.1            Reconcile succeeded                                                    6h8m
tkg-system   testcluster-vsphere-csi                         vsphere-csi.tanzu.vmware.com                         3.0.2+vmware.2-tkg.1             Reconcile failed: Error (see .status.usefulErrorMessage for details)   6h8m
[root@CentOS7TestVM ~]#
[root@CentOS7TestVM ~]# kubectl delete pkgi testcluster-vsphere-csi -n tkg-system
packageinstall.packaging.carvel.dev "testcluster-vsphere-csi" deleted
[root@CentOS7TestVM ~]# kubectl get pkgi -A
NAMESPACE    NAME                                            PACKAGE NAME                                         PACKAGE VERSION                  DESCRIPTION           AGE
tkg-system   testcluster-antrea                              antrea.tanzu.vmware.com                              1.11.2+vmware.1-tkg.1-advanced   Reconcile succeeded   6h9m
tkg-system   testcluster-capabilities                        capabilities.tanzu.vmware.com                        0.31.0+vmware.1                  Reconcile succeeded   6h9m
tkg-system   testcluster-load-balancer-and-ingress-service   load-balancer-and-ingress-service.tanzu.vmware.com   1.10.2+vmware.1-tkg.1            Reconcile succeeded   6h9m
tkg-system   testcluster-metrics-server                      metrics-server.tanzu.vmware.com                      0.6.2+vmware.1-tkg.4             Reconcile succeeded   6h9m
tkg-system   testcluster-pinniped                            pinniped.tanzu.vmware.com                            0.24.0+vmware.1-tkg.2            Reconcile succeeded   6h9m
tkg-system   testcluster-secretgen-controller                secretgen-controller.tanzu.vmware.com                0.14.2+vmware.2-tkg.3            Reconcile succeeded   6h9m
tkg-system   testcluster-tkg-storageclass                    tkg-storageclass.tanzu.vmware.com                    0.31.0+vmware.1                  Reconcile succeeded   6h9m
tkg-system   testcluster-vsphere-cpi                         vsphere-cpi.tanzu.vmware.com                         1.27.0+vmware.1-tkg.1            Reconcile succeeded   6h9m
tkg-system   testcluster-vsphere-csi                         vsphere-csi.tanzu.vmware.com                         3.0.2+vmware.2-tkg.1             Reconcile succeeded   18s

On the Management cluster 
====================
[root@CentOS7TestVM ~]# kubectl get pkgi mgmt-avi-vsphere-csi -n tkg-system
NAME                   PACKAGE NAME                   PACKAGE VERSION        DESCRIPTION                                                            AGE
mgmt-avi-vsphere-csi   vsphere-csi.tanzu.vmware.com   3.0.2+vmware.2-tkg.1   Reconcile failed: Error (see .status.usefulErrorMessage for details)   7h10m
[root@CentOS7TestVM ~]#
[root@CentOS7TestVM ~]# kubectl delete pkgi mgmt-avi-vsphere-csi -n tkg-system
packageinstall.packaging.carvel.dev "mgmt-avi-vsphere-csi" deleted
[root@CentOS7TestVM ~]# kubectl get pkgi mgmt-avi-vsphere-csi -n tkg-system
NAME                   PACKAGE NAME                   PACKAGE VERSION        DESCRIPTION           AGE
mgmt-avi-vsphere-csi   vsphere-csi.tanzu.vmware.com   3.0.2+vmware.2-tkg.1   Reconcile succeeded   93s
[root@CentOS7TestVM ~]# kubectl -n vmware-system-csi get po
NAME                                      READY   STATUS    RESTARTS   AGE
vsphere-csi-controller-59565f4949-rdqlh   7/7     Running   0          2m19s
vsphere-csi-node-5t879                    3/3     Running   0          2m19s
vsphere-csi-node-k7pjt                    3/3     Running   0          2m19s
[root@CentOS7TestVM ~]#