"The operation is not allowed in the current state" error when trying to attach a Persistent Volume to a Kubernetes worker node

Products

VMware Telco Cloud Automation VMware vSAN

Issue/Introduction

Symptoms:
Each of the following conditions should be met for the issue to match this particular article, if some of them are not met then the cause may be different and should be investigated further.

First Class Disks (FCDs) provisioned via the vSphere Container Storage Plug-in (vSphere CSI driver) fail to attach to Virtual Machines.
Running a describe on the impacted Persistent Volume in Kubernetes may show the following event:

CnsFault error: CNS: Failed to attach disk when calling AttachDisk:Fault cause: vim.fault.InvalidState\\n\"\n})\n"

Checking the Cloud Native Storage (CNS) logs at /var/log/vmware/vsan-health/vsanvcmgmtd.log at the time of the failure show the following error:

2022-03-21T14:24:53.428Z error vsanvcmgmtd[08382] [vSAN@6876 sub=FcdService opId=05692c2c] CNS: Failed to attach disk when calling AttachDisk:N3Vim5Fault12InvalidState9ExceptionE(Fault cause: vim.fault.InvalidState
2022-03-21T14:24:53.430Z error vsanvcmgmtd[08382] [vSAN@6876 sub=VolumeManager opId=05692c2c] CNS: Failed to attach disk to vm: vim.VirtualMachine:vm-123456 with err: N3cns12CnsExceptionE(CNS: Failed to attach disk when calling AttachDisk:Fault cause: vim.fault.InvalidState

The 'vim.fault.InvalidState' error is also reported in the /var/log/vmware/vpxd/vpxd.log as follows:

2022-03-21T14:24:53.181Z info vpxd[04097] [Originator@6876 sub=Default opID=05692c2c-83] [VpxLRO] -- ERROR task-4328604 -- vm-123456 -- vim.VirtualMachine.attachDisk: vim.fault.InvalidState:
--> Result:
--> (vim.fault.InvalidState) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>
--> msg = "The operation is not allowed in the current state."
--> }
--> Args:
-->
--> Arg diskId:
--> (vim.vslm.ID) {
--> id = "9476afb6-b7bd-46a5-8538-0c203bdf636c"
--> }
--> Arg datastore:
--> 'vim.Datastore:datastore-5002'
--> Arg controllerKey:
--> 1000
--> Arg unitNumber:
-->

From hostd logs where the worker node resides you would also see error pointing to cbt issue.

2024-10-25T12:03:55.267Z verbose hostd[43843324] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b00233abe7dd1c-9317d6ffffed5f6f/a97c1b67-1219-cd8b-fa6e-78ac44a9a480/p2-services-cluster01-worker
-pool-bpkdz-6fd9b5794fxzhgb6-24fmr.vmx opID=f4b235a5-2c0e-4863-aa60-ede390c281ce-857833-9b-9b-144e user=vpxuser:VirtualCenter] Cannot attach a CBT enabled fcd to CBT disabled VM
2024-10-25T12:03:55.267Z info hostd[43843324] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b00233abe7dd1c-9317d6ffffed5f6f/a97c1b67-1219-cd8b-fa6e-78ac44a9a480/p2-services-cluster01-worker-pool-bpkdz-6fd9b5794fxzhgb6-24fmr.vmx opID=f4b235a5-2c0e-4863-aa60-ede390c281ce-857833-9b-9b-144e user=vpxuser:VirtualCenter] Reconfigure failed: N3Vim5Fault12InvalidState9ExceptionE(Fault cause: vim.fault.InvalidState]

The same failure also occurs if the user attempts to manually attach the disk to the Virtual Machine via the vCenter Server WebClient
Changed Block Tracking (CBT) is not enabled on the Kubernetes worker node Virtual Machine where the disk is attempting to be attached. This can be checked by right-click the Virtual Machine and navigate to Edit Settings > VM Options > Advanced > Configuration Parameters > Edit Configuration and filter for ctkEnabled. If this parameter is either missing or set to "FALSE" then CBT is not enabled on the Virtual Machine.
Changed Block Tracking is enabled on the First Class Disk which is failing to attach. This can be checked via the vCenter Server Managed Object Browser (mob) by opening the following URL in a web browser https://<vc_fqdn>/mob/?moid=VStorageObjectManager&method=retrieveVStorageObject. In the window that opens you need to enter the volume ID and datastore managed object reference. The volume ID can be retrieved by selecting the cluster in the WebClient inventory then Monitor > Cloud Native Storage > Container Volumes > Click the Details icon on the impacted Volume Name > Basics > Volume ID. While the datastore managed object reference can be retrieved either from the vpxd log extact such as shown above or by selecting the datastore where the volume resides in the WebClient inventory and copying the "Datastore:" reference from the browser URL e.g. datastore-5002. Once the id & datastore details are populated in the pop-up window then click Invoke Method and confirm the 'changedBlockTrackingEnabled' value which is returned shows as true (screenshot example below shows false).

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

2.x
3.x

Cause

Change Block Tracking (CBT) is enabled on the FIrst Class Disk (FCD) but not on the Kubernetes node VM
CBT becomes enabled on the disks if they are attached to Virtual Machines being backed up that have CBT enabled. Detaching the FCD from the Virtual Machine does not disable CBT on the disk.
Per the Behavior of FCD With Changed Block Tracking section of the Virtual Disk Development Kit Programming Guide documentation this is expected behavior:

"Attaching an FCD with CBT enabled to a VM with CBT disabled throws an error, unless the FCD is attached as "independent nonpersistent" disk. "

Resolution

Apply one of the following options:

Enable Changed Block Tracking on the Kubernetes Worker node VM(s) as outlined in the following KB: Changed Block Tracking (CBT) on virtual machines

OR

Disable Changed Block Tracking on the First Class Disk. To do this perform the following steps

Go to https://<vc_fqdn>/mob/?moid=VStorageObjectManager&method=clearVStorageObjectControlFlags
Provide the Volume id in the id field along with the datastore ManagedObjectReference in the datastore field
Provide enableChangedBlockTracking in the controlFlags field as follows
<controlFlags>enableChangedBlockTracking</controlFlags>
Then click on Invoke Method
Recheck the FCD changedBlockTrackingEnabled setting via MOB again to ensure it now shows as false as outlined in the Symptoms section above.

OR

If CBT is being enabled on the FCD due to a backup solution, and if policy/process prevents you from the previous 2 options, reconfigure the backup solution to not use CBT (if available).

Additional Information

Impact/Risks:
Please consult with your backup vendor for any possible impacts if you decide to disable Changed Block Tracking on the Virtual Machines as part of the resolution.