vMotion fails at 20% for MSCS/Microsoft WSFC VMs with RDM disks
search cancel

vMotion fails at 20% for MSCS/Microsoft WSFC VMs with RDM disks

book

Article ID: 338906

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:


Migration of MSCS virtual machines with RDM disk gets stuck at 20% and eventually fails.

Logs on the Destination server would show similar events as below example

Hostd.log
YYYY-MM-DDT08:16:10.011Z warning hostd[1234567] [Originator@1234 sub=Vmsvc.vm:/vmfs/volumes/UUID/TEST/TEST.vmx opID=abcd123-123456-auto-1new23-h5:xxxxxxxx-xx-xx-xx-xxxx user=vpxuser:admin\admin] PopulateCache failed: _diskAccess : false, _storageAccessible : true
YYYY-MM-DDT08:16:10.012Z warning hostd[1234567] [Originator@1234 sub=Vmsvc.vm:/vmfs/volumes/UUID/TEST/TEST.vmx opID=abcd123-123456-auto-1new23-h5:xxxxxxxx-xx-xx-xx-xxxx user=vpxuser:admin\admin] FetchUpdatedLayout: No cached layout files available. Doing a full fetch
YYYY-MM-DDT08:16:10.012Z warning hostd[1234567][Originator@1234 sub=Vmsvc.vm:/vmfs/volumes/UUID/TEST/TEST.vmx opID=abcd123-123456-auto-1new23-h5:xxxxxxxx-xx-xx-xx-xxxx user=vpxuser:admin\admin] CannotRetrieveCorefiles: VM disk access is turned off

vmkernel.log
YYYY-MM-DDT08:19:10.012Z cpu58:2097339)ScsiDeviceIO: 3484: Cmd(0x45cadfc718c0) 0x1a, CmdSN 0x1943355 from world 0 to dev "naa.6589cfc00000056ef3af090272007105" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
YYYY-MM-DDT08:19:10.012Z cpu56:2098449)WARNING: ScsiCore: 1851: Invalid sense buffer: error=0x0, valid=0x0
YYYY-MM-DDT08:19:10.012Z cpu56:2098449)NMP: nmp_ResetDeviceLogThrottling:3580: Error status H:0x0 D:0x18 P:0x0 Sense Data: 0x0 0x0 0x0 from dev "naa.6589cfc00000056ef3af090272007105" occurred 2344 times(of 2344 commands)
YYYY-MM-DDT08:19:10.012Z cpu56:2098449)WARNING: ScsiCore: 1851: Invalid sense buffer: error=0x0, valid=0x0


Environment

VMware vSphere ESXi 7.x
VMware vSphere 6.x

Cause

Perennially reserved fag is not enabled on the LUNs which are connected to the MSCS VMs as Physical RDM access.

WSFC cluster nodes that are spread over several ESXi hosts require physical RDMs. The RDMs are shared among all hosts where cluster nodes run. The host with the active node holds persistent SCSI-3 reservations on all shared RDM devices. 
When the active node is running and devices are locked, no other host can write to the devices. The same issue might also affect rescan operations.


Resolution


Enable the Perennially reserved flag status to TRUE on all the disks which are configured as Physical RDM on the Windows clustered VMs

Command: esxcli storage core device setconfig -d naa.id --perennially-reserved=true

Refer: Change Perennial Reservation Settings  

Note: Please ensure that the Perennially reserved flag is set to True for the RDM disks across all the ESXi hosts in the cluster

Additional Information

Validate the configuration of the MSCS Virtual Machines and the disks attached.

vSphere 6.0 adds support for vMotion of MSCS clustered virtual machines.

Pre-requisites for vMotion support:

  1. vMotion is supported only for a cluster of virtual machines across physical hosts (CAB) with pass-through RDMs.
  2. The vMotion network must be a 10Gbps Ethernet link. 1Gbps Ethernet link for vMotion of MSCS virtual machines is not supported.
  3. vMotion is supported for Windows Server 2008 SP2 and above releases. Windows Server 2003 is not supported.
  4. The MSCS cluster heartbeat time-out must be modified to allow 10 missed heartbeats.
  5. The virtual hardware version for the MSCS virtual machine must be version 11 and later.


Reference:



Impact/Risks:


Caution: Please do not enable the "Perennially reserved flag = True" for disks associated with VMFS volumes.

Refer: https://kb.vmware.com/s/article/2040666