Virtual machine (VM) residing on a specific datastore fails to power on
search cancel

Virtual machine (VM) residing on a specific datastore fails to power on

book

Article ID: 401225

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VM residing on a specific datastore fails to power on with the following message:
An error occurred while creating temporary file for /vmfs/volumes/########-########-####-############/VM_Name/VM_Name.vmx: The file already exists Failed to start the virtual machine (error -18). Cannot open the configuration file /vmfs/volumes/########-########-####-############/VM_Name/VM_Name.vmx.

Environment

ESXi 7.0
ESXi 8.0

VMFS6

Cause

LUN that was backing the datastore was inadvertently re-used to install ESXi

We observe the following in the vmkernel log being logged, repetitively:

vmkwarning: cpu28:2098570)WARNING: FS3: 636: VMFS volume Datastore_Name/########-########-####-############ on naa.################################:1 has been detected corrupted
vmkernel: cpu28:2098570)FS3: 639: While filing a PR, please report the names of all hosts that attach to this LUN, tests that were running on them,
vmkernel: cpu28:2098570)FS3: 662: and upload the dump by `voma -m vmfs -f dump -d /vmfs/devices/disks/naa.################################:1 -D X`
vmkernel: cpu28:2098570)FS3: 665: where X is the dump file name on a DIFFERENT volume


vmkwarning: cpu12:21082291)WARNING: DLX: 1022: Volume ########-########-####-############ ("Datastore_Name") might be damaged on the disk. Corrupt lock detected at offset #########
vmkwarning: cpu12:21082291)WARNING: [type b9b0920e offset ################### v ####################, hb offset ###################




The partition was over-written with ESXi OS partitions:
[root@esxi##:~] partedUtil getptbl /vmfs/devices/disks/naa.################################
gpt
80935 255 63 1300234240
1 64 204863 C12A7328F81F11D2BA4B00A0C93EC93B systemPartition 128
5 208896 8595455 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
6 8597504 16984063 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
7 16986112 268435455 4EB2EA3978554790A79EFAE495E21F8D vmfsl 0

Resolution

This change is "irreversible" hence, the virtual machines cannot be recovered as the physical blocks on lun are overwritten. 
Need to restore from valid backups OR rebuild the virtual machines.

Additional Information

If ESXi hosts that have been presented with the affected LUN are rebooted, they will fail to boot with PSOD citing that it has detected two ESXi installations.
Error: The system has found a problem on your machine and cannot continue. Two filesystems with the same UUID have been detected. Make sure you do not have two ESXi installations.



To tackle this, unpresent the affected LUN from the ESXi hosts, at the storage array and reboot the hosts again.