This article is designed to assist customers facing issues with the vsphere-csi-node in Kubernetes clusters after an upgrade. It provides essential information for identifying, understanding, and resolving the issue related to duplicate csi-resizer.
Customers upgrading their TKG kubernetes cluster may experience an issue with the vsphere-csi-node. The primary symptom includes an error in the vsphere-csi app status, showing client-side throttling and an invalid deployment due to a duplicate container entry. Key indicators are:
This issue arises due to a change in the behavior of the CSI resizer between versions. In version 1.3, the csi-resizer was not automatically included and required an overlay for addition. However, from version 1.4 onwards, the csi-resizer is added automatically. This change leads to a configuration conflict when upgrading from a version prior to 1.4, as the manually added csi-resizer from the overlay duplicates the now automatically included resizer, resulting in deployment errors.
The csi-resizer container is included in TKG 1.4 onwards. Please refer to the workaround section to remove this duplicate container.
To mitigate this issue, backup the secret named "<cluster-name>-vsphere-csi-addon" and then remove the overlays.yaml section from the vsphere-csi-addon.yaml secret on the management cluster. This action allows the addon controller to correctly propagate the necessary changes to the workload clusters. Be aware that the kapp reconcile interval has been adjusted to 10 minutes, so expect a wait of approximately 10 minutes for the reconciliation process to complete.
This configuration conflict significantly impacts the operational capabilities of the cluster, causing difficulties in provisioning new PVs and potential disruptions in storage access for pods. It also creates concerns about upgrading other clusters due to the risk of similar issues.