Resolve NAPP Data Storage 'failed to scale up DATA_STORAGE in stage resize persistent volume: failed to validate resizing: pvc and pv not match' Error
search cancel

Resolve NAPP Data Storage 'failed to scale up DATA_STORAGE in stage resize persistent volume: failed to validate resizing: pvc and pv not match' Error

book

Article ID: 339715

calendar_today

Updated On:

Products

VMware NSX Networking VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

Symptoms:

You might experience any of the following symptoms:

  • In the NSX NAPP UI:
    • Under Core Services > Data Storage, you will see an alert with the message 'failed to scale up DATA_STORAGE in stage resize persistent volume: failed to validate resizing: pvc and pv not match'
  • In the vSphere Client:

    • On some ESXi hosts, under the recent 'Tasks', you may encounter constant errors related to disk extension with messages like 'Failed to extend disk'
  • In Kubernetes:

    • In the Supervisor Cluster vsphere-csi-controller Pod logs, you might see the error:

    2023-07-07T12:42:33.265576301Z stderr F {"level":"error","time":"2023-07-07T12:42:33.265576301Z","caller":"wcp/controller.go:941","msg":"failed to expand volume: \"<volume ID>\" to size: <size> err failed to extend volume: \"<volume ID>\", fault: \"(*types.LocalizedMethodFault)(0xc000c9ee80)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n BaseMethodFault: (types.BaseMethodFault) <nil>,\\n Reason: (string) (len=16) \\\"VSLM task failed\\\"\\n },\\n LocalizedMessage: (string) (len=32) \\\"CnsFault error: VSLM task failed\\\"\\n})\\n\", ...}



Environment

VMware vSphere 8.0 with Tanzu
VMware NSX-T
VMware vSphere 7.0 with Tanzu

Cause

This can happen if the disk expansion takes longer than the code anticipates and it times-out, so the MinIO Pods are not restarted.

If the MinIO Pods are not restarted by the scale-up process, the Pods do not pick up the new PersistentVolumeClaim and the new capacity is not made available to the NSXi platform.

Resolution

This can be resolved by restarting the MinIO Pods.

In order to restart the MinIO Pods, the following steps can be followed from a 'kubectl' session or from any of the NSX Manager nodes.

Option #1: Running the commands from the NSX Manager nodes:

  1. Get the Minio StatefulSet:

    napp-k get statefulset -n nsxi-platform | grep minio

  2. From the previous output, take note of the number of replicas the StatefulSet currently has.
  3. Scale down to 0 replicas the MinIO StatefulSet:

    napp-k scale deploy minio -n nsxi-platform --replicas=0

  4. Scale back up to the original number of replicas (N):

    napp-k scale deploy minio -n nsxi-platform --replicas=N

    The number of replicas (N) was collected on step 2, however, there are usually 4 replicas of the MinIO Pods

    Example:
    napp-k scale deploy minio -n nsxi-platform --replicas=4

  5. Wait until all the MinIO Pods are in a Running state:

    napp-k get pods -n nsxi-platform | grep minio

  6. Refresh the NSX NAPP UI to verify if the issue / error persists


Option #2: Running the commands from a 'kubectl' session:

  1. Get the Minio StatefulSet:

    kubectl get statefulset -n nsxi-platform | grep minio

  2. From the previous output, take note of the number of replicas the StatefulSet currently has.
  3. Scale down to 0 the MinIO StatefulSet:

    kubectl scale deploy minio -n nsxi-platform --replicas=0

  4. Scale back up to the original number of replicas (N):

    kubectl scale deploy minio -n nsxi-platform --replicas=N

    The number of replicas (N) was collected on step 2, however, there are usually 4 replicas of the MinIO Pods

    Example:
    kubectl scale deploy minio -n nsxi-platform --replicas=4

  5. Wait until all the MinIO Pods are in a Running state:

    kubectl get pods -n nsxi-platform | grep minio

  6. Refresh the NSX NAPP UI to verify if the issue / error persists