vMotion failure on VMFS datastore
search cancel

vMotion failure on VMFS datastore

book

Article ID: 420252

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

vMotion on a VMFS datastore failed with the following errors. 

vmware.log: 
YYYY-MM-DDT09:45:53.337Z In(05) vmx - [msg.configdb.open] An error occurred while opening configuration file "/vmfs/volumes/########-########-####-############/<VM Name>/<VM Name>.vmx": Failed to lock the file.
YYYY-MM-DDT09:45:53.337Z In(05) vmx - ----------------------------------------
YYYY-MM-DDT09:45:53.337Z Wa(03) vmx - Migrate: Failed to write out config file.
YYYY-MM-DDT09:45:53.337Z In(05) vmx - Migrate: Caching migration error message list:
YYYY-MM-DDT09:45:53.337Z In(05) vmx - [msg.migrate.expired] Timed out waiting for migration start request.

The VMFS datastore looks damaged triggered by APD prior to attempting the vMotion operation. 

vmkernel.log: 
YYYY-MM-DDT00:57:37.619Z In(182) vmkernel: cpu2:2097449)ScsiDevice: 5738: Device state of naa.################################ set to APD_START; token num:1
YYYY-MM-DDT00:57:37.619Z In(182) vmkernel: cpu2:2097449)StorageApdHandler: 1191: APD start for 0x4307e7f34870 [naa.################################]
YYYY-MM-DDT00:57:37.619Z In(182) vmkernel: cpu6:2097646)StorageApdHandler: 408: APD start event for 0x4307e7f34870 [naa.################################]
YYYY-MM-DDT00:57:37.619Z In(182) vmkernel: cpu6:2097646)StorageApdHandlerEv: 106: Device or filesystem with identifier [naa.################################] has entered the All Paths Down state.
YYYY-MM-DDT00:58:25.423Z In(182) vmkernel: cpu41:2097875)NMP: nmp_ResetDeviceLogThrottling:3854: last error status from device naa.################################ repeated 2 times
YYYY-MM-DDT00:59:57.622Z In(182) vmkernel: cpu0:2097646)StorageDevice: 4647: No Handlers registered! (naa.################################)!
YYYY-MM-DDT00:59:57.622Z In(182) vmkernel: cpu0:2097646)StorageApdHandler: 606: APD timeout event for 0x4307e7f34870 [naa.################################]
YYYY-MM-DDT00:59:57.622Z In(182) vmkernel: cpu0:2097646)StorageApdHandlerEv: 120: Device or filesystem with identifier [naa.################################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will n$
YYYY-MM-DDT01:00:58.983Z In(182) vmkernel: cpu3:17113648)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0xa3 (0x45b9db766e40, 0) to dev "naa.################################" on path "vmhba#:C#:T#:L#" Failed:
YYYY-MM-DDT01:00:58.983Z In(182) vmkernel: cpu17:10191629)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x12 (0x45b9db766e40, 0) to dev "naa.################################" on path "vmhba#:C#:T#:L#" Failed:
YYYY-MM-DDT01:00:58.984Z In(182) vmkernel: cpu9:2097450)ScsiDevice: 5760: Setting Device naa.################################ state back to 0x2
YYYY-MM-DDT01:00:58.985Z In(182) vmkernel: cpu9:2097450)StorageDevice: 4647: No Handlers registered! (naa.################################)!
YYYY-MM-DDT01:00:58.985Z In(182) vmkernel: cpu9:2097450)ScsiDevice: 5792: Device naa.################################ is Out of APD; token num:1
YYYY-MM-DDT01:00:58.985Z In(182) vmkernel: cpu1:2097646)StorageApdHandler: 500: APD exit event for 0x4307e7f34870 [naa.################################, 0]
YYYY-MM-DDT01:00:58.985Z In(182) vmkernel: cpu1:2097646)StorageApdHandlerEv: 113: Device or filesystem with identifier [naa.################################] has exited the All Paths Down state.

Unable to obtain a lock on VMFS affected below. 

YYYY-MM-DDT09:45:42.264Z In(182) vmkernel: cpu22:19916846 opID=5898e258)DLX: 4607: vol '<Datastore Name>', lock at 120102912: [Req mode 1] Checking liveness:
YYYY-MM-DDT09:45:46.386Z In(182) vmkernel: cpu22:19916846 opID=5898e258)DLX: 5280: vol '<Datastore Name>', lock at 120102912: Lock type: 10C00001. [Req mode: 1] Not free, pollStat 3:
YYYY-MM-DDT09:45:46.386Z In(182) vmkernel: cpu22:19916846 opID=5898e258)DLX: 5446: Vol: <Datastore Name> Lock at 120102912:<Datastore Name> type 281018369, lockTimeUS 4122782, leaseWaitTimeMS 4122, status: Lock was not free
YYYY-MM-DDT09:45:46.386Z In(182) vmkernel: cpu22:19916846 opID=5898e258)DLX: 2670: vol '<Datastore Name>', lock at 120102912: Lock type: 10C00001. Exclusive Lock(s) held on a file on volume ########-########-####-############. numHolders:0 gblNumHolders:0, volume state 10,$
YYYY-MM-DDT09:45:46.386Z In(182) vmkernel: cpu22:19916846 opID=5898e258)DLX: 2679: vol '<Datastore Name>', lock at 120102912: Lock type: 10C00001. owner(s) MAC: ##:##:##:##:##:##:
YYYY-MM-DDT09:45:46.386Z In(182) vmkernel: cpu22:19916846 opID=5898e258)Fil3: 5205: Lock failed on file: <VM Name>.vmx on vol '<Datastore Name>' with FD: <FD c49 r5>

 

Environment

VMware vSphere ESXi
VMware vCenter Server

Cause

VMFS datastore corruption

Resolution

Recreating the affected VMFS datastore. 

VOMA might be able to fix the corruption.  For more information, please see Using vSphere On-disk Metadata Analyzer (VOMA) to check VMFS metadata consistency