After HA is triggered VMs are reported "Moved or Copied" in NFSv3 environment
search cancel

After HA is triggered VMs are reported "Moved or Copied" in NFSv3 environment

book

Article ID: 323108

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • After HA is triggered, vSphere Client reports that VMs have been binbbin"Moved or Copied" .
  • There is a split brain where the VM seem to be running on multiple ESXi hosts
  • The VMs complain about lock during VM power ON.
  • You will see similar to below messages in vmware.log
2019-02-24T04:52:25.477Z| vmx| I125: Msg_Question:
2019-02-24T04:52:25.477Z| vmx| I125: [msg.uuid.altered] This virtual machine might have been moved or copied.
2019-02-24T04:52:25.477Z| vmx| I125+ In order to configure certain management and networking features, VMware ESX needs to know if this virtual machine was moved or copied.
2019-02-24T04:52:25.477Z| vmx| I125+
2019-02-24T04:52:25.477Z| vmx| I125+ If you don't know, answer "I Copied It".
2019-02-24T04:52:25.477Z| vmx| I125:
2019-02-24T04:52:25.477Z| vmx| I125: ----------------------------------------
2019-02-24T04:52:25.479Z| vmx| I125: Vigor_ClientRequestCb: failed to do op=3 on unregistered device 'Tools' (cmd=queryFields)
2019-02-24T04:52:25.479Z| vmx| I125: Vigor_ClientRequestCb: failed to do op=3 on unregistered device 'CrashDetector' (cmd=queryFields)
Received an answer from hostd here reply choice 1 mean "I Moved It"
2019-02-24T04:55:46.537Z| vmx| I125: VigorTransportProcessClientPayload: opID=SWI-4bb841e1-646f seq=35: Receiving Bootstrap.MessageReply request.
2019-02-24T04:55:46.538Z| vmx| I125: VigorTransport_ServerSendResponse opID=SWI-4bb841e1-646f seq=35: Completed Bootstrap request.
2019-02-24T04:55:46.538Z| vmx| I125: MsgQuestion: msg.uuid.altered reply=1
2019-02-24T04:55:46.538Z| vmx| I125: UUID: Writing uuid.location value: '56 4d f2 a6 10 2e ad d2-e5 51 95 0c 71 21 d7 db'
Opening all the disks took around 3.7 mins
2019-02-24T04:59:28.782Z| vmx| I125: DISK: Opening disks took 222203 ms.
vmkernel starting the VM <vmname>
2019-02-24T04:52:25.438Z cpu56:25138582)World: vm 25302727: 7379: Starting world vmm1:<vmname> of type 8
NFS file locks:
2019-02-24T04:53:56.595Z cpu75:18209067 opID=50304a5d)WARNING: NFSLock: 2219: File is being locked by a consumer on host <hostname> with exclusive lock.


Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment

Environment

VMware vSphere ESXi 6.7
VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.5

Cause

NFSv3 uses disk-based locks for providing synchronization across hosts.The delay from NFS is due to NFSv3 locking mechanism.

Resolution

NFS uses disk based locks to provide exclusive access to VM files like .vmx, .vmdk etc. This is the behavior of NFS v3 locking in ESX and will be observed in HA setups. The limitation is from NFS v3 locking.In NFS4.1 , NFS Server manages all the locks.
NFS 3 locking on ESXi does not use the Network Lock Manager (NLM) protocol. Instead, VMware provides its own locking protocol. NFS 3 locks are implemented by creating lock files on the NFS server. Lock files are named .lck-file_id..
NFS 4.1 uses share reservations as a locking mechanism. The support for NFSv4.1 was introduced in ESXi 6.0.x onwards.

For more information refer to VMware documentation : NFS File Locking



Workaround:
1. To identify and resolve the NFS lock follow KB : Understanding the NFS .lck lock file to understand the ESX host and NFS filename it refers to

 For vSphere ESXi 7.0.x and later  we have following script to get the details of each .lck file.

[root@ESXi 7.0:~] python ./usr/lib/vmware/vm-support/bin/nfsLockInfo.pyc
usage: ./usr/lib/vmware/vm-support/bin/nfsLockInfo.pyc <vmdir>)

2. For answering VM question refer to VMware KB : Changing or keeping a UUID for a moved virtual machine

Additional Information

Impact/Risks:
None