Red Hat OpenShift cluster deployment on vSAN cluster fails during provisioning of VMs phase
search cancel

Red Hat OpenShift cluster deployment on vSAN cluster fails during provisioning of VMs phase

book

Article ID: 392106

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • The Red Hat OpenShift cluster deploy script triggered cloning of VM from template stalls at 39% and times out.

  • The cluster deployment fails.

  • Each time the script is re-run the cloning stalls at different VM stages(master VMs or worker VMs or infra VMs).

  • There are no alerts reported on vSAN Skyline Health indicating an issue with vSAN.

  • vSAN proactive test for VM creation is successful.

Environment

VMware vSAN 7.x

Cause

  • This issue is caused by the special component called the BlkAttr (Block-Attribute) component not being present on the capacity-tier devices on the vSAN host.

  • Introduced with vSAN 6.2, a special-purpose component was created on each capacity-tier device to enable new features like Checksum, Deduplication and Compression.

  • If this BlkAttr component is not initialized/present on the capacity-tier devices, then the below issues can occur:

    • Virtual Machine provisioning fails due to performance problems or timeouts.

    • vSAN cluster will experience very high component-level congestion or memory congestion.

  • In the /var/run/log/vmkernel.log the below entry can be seen:

    YYYY-MM-DDTHH:MM:SS.SSSZ cpu53:2099149)WARNING: LSOM: LSOMCommitFlusherDispatch:5171: Throttled: BlkAttr not ready for disk ########-####-####-####-############ comp ########-####-####-####-############ (0x450083210438)

Resolution

Steps to resolve the issue:

  1. Run below script via SSH on the vSAN host to verify if the BlkAttr component is present (BlkAttr component - 626c6b41-7474-7243-6f6d-706f6e656e74, this UUID is common for all vSAN environments):

    for uuid in $(localcli vsan storage list|grep -B10 'Is Capacity Tier: true'|grep 'VSAN UUID:'|awk '{print $3}');do blkAttr=$(vsish -er ls /vmkModules/lsom/disks/$uuid/recoveredComponents/ 2>&1|grep 626c6b41-7474-7243-6f6d-706f6e656e74);if [ "$blkAttr" != "" ];then echo "Disk $uuid: blkAttr component found.";else echo "Disk $uuid: blkAttr component NOT FOUND.";fi;done
  2. For the host which shows the output "blkAttr component NOT FOUND" proceed to enable the component on the host by running the following command:

    vsish -e set /vmkModules/lsom/lsomEnableBlkAttr

Once the BlkAttr is enabled, initiate the OpenShift cluster deployment and it should be successful.

Incase this does not resolve the issue, kindly open a Broadcom Support ticket to investigate the issue further.

Additional Information