The Red Hat OpenShift cluster deploy script triggered cloning of VM from template stalls at 39% and times out.
The cluster deployment fails.
Each time the script is re-run the cloning stalls at different VM stages(master VMs or worker VMs or infra VMs).
There are no alerts reported on vSAN Skyline Health indicating an issue with vSAN.
vSAN proactive test for VM creation is successful.
VMware vSAN 7.x
This issue is caused by the special component called the BlkAttr (Block-Attribute) component not being present on the capacity-tier devices on the vSAN host.
Introduced with vSAN 6.2, a special-purpose component was created on each capacity-tier device to enable new features like Checksum, Deduplication and Compression.
If this BlkAttr component is not initialized/present on the capacity-tier devices, then the below issues can occur:
Virtual Machine provisioning fails due to performance problems or timeouts.
vSAN cluster will experience very high component-level congestion or memory congestion.
In the /var/run/log/vmkernel.log the below entry can be seen:
YYYY-MM-DDTHH:MM:SS.SSSZ cpu53:2099149)WARNING: LSOM: LSOMCommitFlusherDispatch:5171: Throttled: BlkAttr not ready for disk ########-####-####-####-############ comp ########-####-####-####-############ (0x450083210438)
Steps to resolve the issue:
Run below script via SSH on the vSAN host to verify if the BlkAttr component is present (BlkAttr component - 626c6b41-7474-7243-6f6d-706f6e656e74, this UUID is common for all vSAN environments):
for uuid in $(localcli vsan storage list|grep -B10 'Is Capacity Tier: true'|grep 'VSAN UUID:'|awk '{print $3}');do blkAttr=$(vsish -er ls /vmkModules/lsom/disks/$uuid/recoveredComponents/ 2>&1|grep 626c6b41-7474-7243-6f6d-706f6e656e74);if [ "$blkAttr" != "" ];then echo "Disk $uuid: blkAttr component found.";else echo "Disk $uuid: blkAttr component NOT FOUND.";fi;doneFor the host which shows the output "blkAttr component NOT FOUND" proceed to enable the component on the host by running the following command:
vsish -e set /vmkModules/lsom/lsomEnableBlkAttrOnce the BlkAttr is enabled, initiate the OpenShift cluster deployment and it should be successful.
Incase this does not resolve the issue, kindly open a Broadcom Support ticket to investigate the issue further.