Addressing Slowness in PVC Deployments for environment Telco cloud automation 2.x and 3.x Versions
search cancel

Addressing Slowness in PVC Deployments for environment Telco cloud automation 2.x and 3.x Versions

book

Article ID: 380517

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

Newly created clusters and Persistent Volume Claims (PVCs) are experiencing slow creation times, particularly in environments with the Configure Multi-Zone Storage Class in TCA. 

Environment

Telco cloud automation 2.x and 3.x Versions

Cause

The cluster, consisting of approximately 100+ nodes, is experiencing slowness when creating multiple Persistent Volume Claims (PVCs) in parallel (around 50-100). The CSI logs indicate a "context deadline exceeded" error, suggesting that volume creation is not completing within the 5-minute timeout period.

For example, when checking the logs for a PVC named pvc-abc123, the first CreateVolume request occurred at 12:23 on October 10, 2024. After 5 minutes, a context deadline exceeded error was logged, followed by additional requests leading to a successful PVC creation at 12:38. This delay appears to stem from the time taken to fetch the shared datastore due to the large number of nodes and specified accessibility requirements for PVC creation.

Resolution

Procedure to Update vsphere-csi-controller in creating multiple Persistent Volume Claims (PVCs) 20 in parallel.


Note: We need to log in to the control plane node of the cluster where we are making changes.


Step 1: Create the Overlay File


Create a file named vsphere-csi-fix.yaml with the following content:


#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata": {"name":"vsphere-csi-controller"}})
---
spec:
  template:
    spec:
      containers:
      #@overlay/match by="name"
      - name: csi-provisioner
        args:
          #@overlay/append
          - --worker-threads=10

Step 2: Create the Secret


Run the following command to create a Kubernetes secret from the overlay file:


kubectl create secret generic vsphere-csi-fix -n tkg-system -o yaml --dry-run=client --from-file=vsphere-csi-fix.yaml | kubectl apply -f -

Step 3: Annotate the Package Installation


Add the annotation to the package installation:


You fetch vsphere-csi pkgi name using command kubectl get pkgi -A


kubectl annotate pkgi <vsphere-csi-pkgi-name>n-n tkg-system ext.packaging.carvel.dev/ytt-paths-from-secret-name.0=vsphere-csi-fix

Step 4: Pause the Application


Pause the pkgi vsphere-csi application by executing:


kubectl patch pkgi <vsphere-csi-pkgi-name> -n tkg-system --type merge -p '{"spec":{"paused":true}}'

Step 5: Enable Reconciliation


After making the necessary changes, enable reconciliation by running:


kubectl patch app <vsphere-csi-pkgi-name> -n tkg-system --type merge -p '{"spec":{"paused":false}}’

Step 6: Verify the Changes


To verify the changes, run:


Then, describe the deployment and check for worker-threads=20:


kubectl describe deployment vsphere-csi-controller -n tkg-system | grep worker-threads

This procedure will help ensure the vsphere-csi-controller deployment is updated correctly.