When the datacenter name is changed in vSphere, or manually updated on a TKGm cluster, you may find that any modifications to the CSI secrets containing the datacenter value are not reconciled or updated.
You may encounter an error similar to the one below in the node-driver-registrar container of the vsphere-csi-node pod on the affected cluster:
E0821 09:26:04.655989 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: "test-cluster-k7uic-7dd856kl9jsgxk-qdayj". Error: "failed to retrieve nodeVM \"424kdll2-4h51-8751-a129-7fjk8oips6as\" using the node manager. Error: datacenter '/datacenter0' not found", restarting registration container
You will also observe that all pods in the csi-node DaemonSet are in a CrashLoopBackOff state.
Due to a bug in this version of TKGm, modifications to <cluster-name>-vsphere-csi-data-values
on the management cluster are not properly propagated to CSI secrets on the workload cluster. This includes the vsphere-config-secret
in the kube-system
namespace, which is also not updated correctly. As a result, CSI pods will remain in the CrashLoopBackOff
state when changes are made to the secret.
1. In management cluster context, list vsphereCSIConfig objects in the default namespace:
kubectl get vsphereCSIConfig
2. Edit the vsphereCSIConfig associated with the workload cluster where the issue is observed:
kubectl edit vsphereCSIConfig test-cluster-1
3. Modify the "datacenter" field with the correct name and save
spec:
vsphereCSI:
config:
datacenter: /new-datacenter-id
After completing this step, all secrets should be updated to reflect the newly configured datacenter. Ensure that you follow steps 4 and 5 to apply these changes to the pods.
4. In workload cluster context, restart all vsphere-csi-controller pods
kubectl rollout restart deployment vsphere-csi-controller -n kube-system
5. Restart the vsphere-csi-node daemonset
kubectl rollout restart daemonset -n kube-system vsphere-csi-node
All vsphere-csi-node pods should now be in a running state and using the updated datacenter configuration.