vsphere-csi package reconciliation fails after workload cluster upgrade due to duplicate csi-resizer
search cancel

vsphere-csi package reconciliation fails after workload cluster upgrade due to duplicate csi-resizer

book

Article ID: 327445

calendar_today

Updated On:

Products

VMware

Issue/Introduction

This article is designed to assist customers facing issues with the vsphere-csi-node in Kubernetes clusters after an upgrade. It provides essential information for identifying, understanding, and resolving the issue related to duplicate csi-resizer.


Symptoms:

Customers upgrading their TKG kubernetes cluster may experience an issue with the vsphere-csi-node. The primary symptom includes an error in the vsphere-csi app status, showing client-side throttling and an invalid deployment due to a duplicate container entry. Key indicators are:

 

  • Error messages indicating a duplicate "csi-resizer" container in the deployment of the vsphere-csi-controller.
  • Inability to provision new Persistent Volumes (PVs).
  • Potential disruptions to pod storage access upon restarts.
  • vsphere-csi app status: Deployment.apps \"vsphere-csi-controller\" is invalid: spec.template.spec.containers[7].name:\n Duplicate value: \"csi-resizer\" (reason: Invalid)"

 


Cause

This issue arises due to a change in the behavior of the CSI resizer between versions. In version 1.3, the csi-resizer was not automatically included and required an overlay for addition. However, from version 1.4 onwards, the csi-resizer is added automatically. This change leads to a configuration conflict when upgrading from a version prior to 1.4, as the manually added csi-resizer from the overlay duplicates the now automatically included resizer, resulting in deployment errors.

Resolution

The csi-resizer container is included in TKG 1.4 onwards. Please refer to the workaround section to remove this duplicate container.

 


Workaround:

To mitigate this issue, backup the secret named "<cluster-name>-vsphere-csi-addon" and then remove the overlays.yaml section from the vsphere-csi-addon.yaml secret on the management cluster. This action allows the addon controller to correctly propagate the necessary changes to the workload clusters. Be aware that the kapp reconcile interval has been adjusted to 10 minutes, so expect a wait of approximately 10 minutes for the reconciliation process to complete.


Additional Information

Impact/Risks:

This configuration conflict significantly impacts the operational capabilities of the cluster, causing difficulties in provisioning new PVs and potential disruptions in storage access for pods. It also creates concerns about upgrading other clusters due to the risk of similar issues.