Attaching linked clone PVC to the powered on VM, which is already attached to source PVC will result in failures.

Products

Tanzu Kubernetes Runtime

Issue/Introduction

The issue presents itself in multiple ways, it could present itself from within a vSphere Kubernetes Service (VKS) cluster or when VM Service VMs are used.

The issue also could present when there are 2 different Pods, one with the source volume and the other with the linked clone volume happens to get scheduled on the same node.

the VC UI would show the failure similar to:

Failed to add disk scsi0:2. Virtual disks "/vmfs/volumes/<volume>/fcd/<vmdkname>.vmdk"
"/vmfs/volumes/<volume>/fcd/<vmdkname>.vmdk" have the same UUID. Failed to power on scsi0:2.

In the example below we focus on VKS, but it applies to VM Service VMs as well.

Consider a case where there is a source Persistent Volume Claim (PVC). We would be able to create a LinkedClone volume from the source PVC by first taking a VolumeSnapshot of the source PVC and then later creating a Linked Clone volume by adding the necessary annotation.

root [ ~ ]# cat lc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
csi.vsphere.volume/fast-provisioning : "true"
name: vs1-lc-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
dataSource:
name: vs-1
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
storageClassName: wcpglobal-storage-profile

Here the vs1-lc-1 is the linked clone volume.

The VKS cluster has a source volume and linked clone volume

root[ ~ ]# kubectl --kubeconfig gcconfig.yaml -n testns get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
src-pvc Bound <pvc-id> 1Gi RWO wcpglobal-storage-profile <unset> 172m
vs1-lc-1 Bound <pvc-id> 1Gi RWO wcpglobal-storage-profile <unset> 168m

Now, when we attempt to create a Pod that has both the source volume and linked clone volume attached to the same Pod spec, the Pod will remain in ContainerCreating state.

root [ ~ ]# cat pod_src_lc.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
namespace: testns
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: test-container
image: <image>
command: ["<command>"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumeMounts:
- name: test-volume
mountPath: /mnt/volume1
- name: vs1-lc-1
mountPath: /mnt/volume2
restartPolicy: Never
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: src-pvc
- name: vs1-lc-1
persistentVolumeClaim:
claimName: vs1-lc-1
root@ [ ~ ]# kubectl --kubeconfig gcconfig.yaml -n testns apply -f pod_src_lc.yaml
pod/pod1 created
root@ [ ~ ]# kubectl --kubeconfig gcconfig.yaml -n testns get pods
NAME READY STATUS RESTARTS AGE
pod1 0/1 ContainerCreating 0 59s

As can be seen from the Pod spec, you can see both the source volume and linked clone volume as the volume mounts. As seen from the above outputs the Pod remains in ContainerCreating state.

The error can be seen in the cnsnodevmbatchattachments outputs in the Supervisor cluster

kubectl get cnsnodevmbatchattachments.cns.vmware.com -n testns <worker-name> -oyaml

apiVersion: cns.vmware.com/v1alpha1
kind: CnsNodeVMBatchAttachment
metadata:
creationTimestamp: "<timestamp>Z"
finalizers:
- cns.vmware.com
generation: 4
name: <worker-name>
namespace: testns
ownerReferences:
- apiVersion: vmoperator.vmware.com/v1alpha5
blockOwnerDeletion: true
controller: true
kind: VirtualMachine
name: <worker-name>
uid: <UID>
resourceVersion: "3132610"
uid: <UID>
spec:
instanceUUID: <instanceUUID>
volumes:
- name: <worker-name>-98babe60
persistentVolumeClaim:
claimName: <worker-name>-98babe60
controllerKey: 1000
diskMode: persistent
sharingMode: sharingNone
unitNumber: 0
- name: <claimname>
persistentVolumeClaim:
claimName: <claimname>
controllerKey: 1000
diskMode: persistent
sharingMode: sharingNone
unitNumber: 1
- name: <claimname>
persistentVolumeClaim:
claimName: <claimname>
controllerKey: 1000
diskMode: persistent
sharingMode: sharingNone
unitNumber: 2
status:
conditions:
- lastTransitionTime: "<timestamp>"
message: 'failed to attach volumes: <cns-volumeID>'
reason: Failed
status: "False"
type: Ready
volumes:
- name: <worker-name>-98babe60
persistentVolumeClaim:
claimName: <worker-name>-98babe60
cnsVolumeId: <cns-volumeID>
conditions:
- lastTransitionTime: "<timestamp>Z"
message: ""
reason: "True"
status: "True"
type: VolumeAttached
diskUUID: <diskID>
- name: <claimname>
persistentVolumeClaim:
claimName: <claimname>
cnsVolumeId: <cns-volumeID>
conditions:
- lastTransitionTime: "<timestamp>Z"
message: ""
reason: "True"
status: "True"
type: VolumeAttached
diskUUID: <diskUUID>
- name: <claimname>
persistentVolumeClaim:
claimName: <claimname>
cnsVolumeId: <cns-volumeID>
conditions:
- lastTransitionTime: "<timestamp>"
message: 'failed to attach cns volume: "<volume-name>"
Error: Failed to add disk scsi0:2.'
reason: AttachFailed
status: "False"
type: VolumeAttached

As seen, the status clearly indicates a failed attach.

As mentioned earlier, the same situation could happen when the volumes are being attached to VM service VMs.

Environment

VCF 9.1

Cause

This is a limitation of linked clone volumes from the underlying FCD layer.

Resolution

There is no direct resolution for this scenario in VKS cluster. If the scenario is related to 2 different pods and one with source volume and the other linked clone volume, then you can use the Pod anti-affinity rules to schedule Pods on different Nodes. This too will not work if there is only one Node in the cluster. Similarly, avoid specifying these source and linked clone volumes as mounts within the same Pod, only one mout will succeed.

The issue presents itself in VM Service VM similarly.