Telco cloud automation 2.x and 3.x Versions
The cluster, consisting of approximately 100+ nodes, is experiencing slowness when creating multiple Persistent Volume Claims (PVCs) in parallel (around 50-100). The CSI logs indicate a "context deadline exceeded" error, suggesting that volume creation is not completing within the 5-minute timeout period.
For example, when checking the logs for a PVC named pvc-abc123
, the first CreateVolume request occurred at 12:23 on October 10, 2024. After 5 minutes, a context deadline exceeded error was logged, followed by additional requests leading to a successful PVC creation at 12:38. This delay appears to stem from the time taken to fetch the shared datastore due to the large number of nodes and specified accessibility requirements for PVC creation.
Procedure to Update vsphere-csi-controller in creating multiple Persistent Volume Claims (PVCs) 20 in parallel.
Note: We need to log in to the control plane node of the cluster where we are making changes.
Step 1: Create the Overlay File
Create a file named vsphere-csi-fix.yaml with the following content:
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata": {"name":"vsphere-csi-controller"}})
---
spec:
template:
spec:
containers:
#@overlay/match by="name"
- name: csi-provisioner
args:
#@overlay/append
- --worker-threads=10
Step 2: Create the Secret
Run the following command to create a Kubernetes secret from the overlay file:
kubectl create secret generic vsphere-csi-fix -n tkg-system -o yaml --dry-run=client --from-file=vsphere-csi-fix.yaml | kubectl apply -f -
Step 3: Annotate the Package Installation
Add the annotation to the package installation:
You fetch vsphere-csi pkgi name using command kubectl get pkgi -A
kubectl annotate pkgi <vsphere-csi-pkgi-name>n-n tkg-system ext.packaging.carvel.dev/ytt-paths-from-secret-name.0=vsphere-csi-fix
Step 4: Pause the Application
Pause the pkgi vsphere-csi application by executing:
kubectl patch pkgi <vsphere-csi-pkgi-name> -n tkg-system --type merge -p '{"spec":{"paused":true}}'
Step 5: Enable Reconciliation
After making the necessary changes, enable reconciliation by running:
kubectl patch app <vsphere-csi-pkgi-name> -n tkg-system --type merge -p '{"spec":{"paused":false}}’
Step 6: Verify the Changes
To verify the changes, run:
Then, describe the deployment and check for worker-threads=20:
kubectl describe deployment vsphere-csi-controller -n tkg-system | grep worker-threads
This procedure will help ensure the vsphere-csi-controller deployment is updated correctly.