Error: "failed to get volumeid from volumemigrationservice" during pod startup failures on TKGI cluster
search cancel

Error: "failed to get volumeid from volumemigrationservice" during pod startup failures on TKGI cluster

book

Article ID: 381531

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

  • Users see errors like: "failed to get volumeid from volumemigrationservice" when describing pods that are stuck in ContainerCreating status.
  • The same errors are presented in the /var/vcap/sys/log/csi-controller/csi-syncer.stderr.log in the Master nodes. Example:

    Warning  FailedAttachVolume  8m4s (x44501 over 15h)  attachdetach-controller  AttachVolume.Attach failed for volume "<PV_NAME>" : rpc error: code = Internal desc = failed to get VolumeID from volumeMigrationService for volumePath: "[DATASTORE_NAME] kubevols/<DATASTORE_ID>-dynamic-<PV_NAME>.vmdk"

  • The PersistentVolumes failing to attach and mount exist in vSphere Container Volumes view.
  • The VolumeAttachment for the PV's is present, but shows Attached: False

Environment

Tanzu Kubernetes Grid Integrated environments on version 1.18.0 or LOWER that have migrated Persistent Volumes from in-tree VCP provider to the out-of-tree CSI driver.

Cause

This failure condition is caused by the cnsvspherevolumemigration CR being deleted when the underlying datastore becomes inaccessible in vCenter. This is a known issue reported in Github vsphere-csi-driver issue 2488 and Github vsphere-csi-driver issue 2470

 

See the vSphere CSI 3.1.0 release notes for the fix details.

Resolution

This issue is resolved in TKGI 1.18.1 and later versions, which include the CSI driver 3.1.0

 

To work around this failure, create a new cnsvspherevolumemigration for the impacted volume. Use the following template for reference:

# Get the values from the error message
# kubectl describe pod OR kubectl describe pvc
VOLUME_ID=82e6207f-93d1-48b0-ba7f-307adf2d69d8
DATASTORE_NAME="[datastore-102-1]"
DATASTORE_ID=66322504-ae3965ce-ed22-0276c6011e29
PV_NAME=pvc-00b10e48-aa62-49ff-847b-29b2fd555cd5

# YAML template
---
apiVersion: cns.vmware.com/v1alpha1
kind: CnsVSphereVolumeMigration
metadata:
  name: ${VOLUME_ID}
spec:
  protectvolumefromvmdelete: true
  volumeid: ${VOLUME_ID}
  volumepath: "${DATASTORE_NAME} kubevols/${DATASTORE_ID}-dynamic-${PV_NAME}.vmdk"


# Apply
kubectl apply -f YOUR_TEMPLATE.yaml

# Check
kubectl get cnsvspherevolumemigrations.cns.vmware.com
 
 
Replace the VOLUME_ID, DATASTORE_NAME, DATASTORE_ID, and PV_NAME with details gathered from the error message as well as a describe of the problem PVC.

Additional Information

When the K8s cluster has a large number of Persistent Volumes, you can retrieve the information in advance for automation.

 

# Approach 1: VOLUME_ID / PV_NAME / DATASTORE_ID

  • VOLUME_ID is 82e6207f-93d1-48b0-ba7f-307adf2d69d8
  • PV_NAME is pvc-00b10e48-aa62-49ff-847b-29b2fd555cd5
  • DATASTORE_ID is 66322504-ae3965ce-ed22-0276c6011e29
    • You can see DATASTORE_ID via vCenter also (vCenter web UI --> Datastores --> Summary --> URL)
# Login to the vCenter
ssh root@VCENTER
shell
/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "SELECT volume_id,volume_name,datastore FROM cns.volume_info;"
#>               volume_id               |               volume_name                |                        datastore
#> --------------------------------------+------------------------------------------+---------------------------------------------------------
#>  82e6207f-93d1-48b0-ba7f-307adf2d69d8 | pvc-00b10e48-aa62-49ff-847b-29b2fd555cd5 | ds:///vmfs/volumes/66322504-ae3965ce-ed22-0276c6011e29/
...

 

# Approach 2: VOLUME_ID / PV_NAME

Install govmomi/govc

  • VOLUME_ID is 82e6207f-93d1-48b0-ba7f-307adf2d69d8
  • PV_NAME is pvc-00b10e48-aa62-49ff-847b-29b2fd555cd5
DATASTORE_PATH=/${DATACENTER_NAME}/datastore/${DATASTORE_NAME}
govc disk.ls -ds ${DATASTORE_PATH}
#> 82e6207f-93d1-48b0-ba7f-307adf2d69d8    pvc-00b10e48-aa62-49ff-847b-29b2fd555cd5
...

 

# Approach 3: VOLUME_ID / DATASTORE_NAME

  • VOLUME_ID is 82e6207f-93d1-48b0-ba7f-307adf2d69d8
  • DATASTORE_NAME is datastore-102-1
govc disk.ls -ds ${DATASTORE_PATH} -L
#> 82e6207f-93d1-48b0-ba7f-307adf2d69d8  [datastore-102-1] fcd/_00a6/996e166c381f48a38a37b68454ad8dad.vmdk
...