This article addresses environments experiencing frequent vSphere HA failover alarms that occur approximately every 8 minutes with "Duplicate VM" detection messages in logs. This issue primarily affects VMs on NFS datastores and causes unexpected VM reboots, especially during or after vMotion operations. End users may report application slowness or disruptions as VMs are remediated repeatedly by HA.
You may observe the following symptoms in your vSphere environment:
- Frequent vSphere HA failover alarms occurring approximately every 8 minutes
- Multiple VMs affected across the cluster with remediation events
- Alarms showing as "yellow" for 30 seconds to 2 minutes before turning green again
- Log entries showing "Duplicate VM" detection in FDM logs
- Virtual machines being rebooted unexpectedly following vMotion operations
- Issues predominantly affecting VMs located on NFS datastores
- End users experiencing intermittent slowness or application disruptions
- Problem may be more prevalent during certain periods of the day (for example, between 12:00 UTC and 21:00 UTC)
In the ESXi host /var/log/fdm.log file, you might see entries similar to the following:
Invoking GetLockOwnerForVmOnNfsDs on Duplicate VM locked file; path: /vmfs/volumes/[datastoreID]/[VM_Name]/[VM_Name].vmx.lck
Nfs lock file name to read lock owner details; filename /vmfs/volumes/[datastoreID]/[VM_Name]/.lck-[ID]
Duplicate VM lock owner identified; path: /vmfs/volumes/[datastoreID]/[VM_Name]/[VM_Name].vmx.lck, owner: host-[ID]
This issue occurs due to a known bug in vSphere HA's split-brain detection mechanism with NFS datastores. When a VM migrates between hosts on NFS storage, there is a brief period where the lock ownership information appears ambiguous to the HA system. The HA component erroneously detects that the VM is running simultaneously on two hosts (a "split-brain" condition) and unnecessarily terminates one instance to "protect" the environment.
Specifically, the issue affects NFS datastores because of how they handle VM lock files differently than VMFS or vSAN datastores. During vMotion operations, vSphere HA may incorrectly identify VMs as being "duplicated" across hosts, triggering unnecessary failover remediation actions.
For detailed resolution steps, please refer to Virtual Machine powered-off after vMotion and HA Failover
The resolution involves adding an advanced setting to disable duplicate VM detection in vSphere HA by setting `das.config.fdm.enableDupVmDetection = false` and then cycling vSphere HA off and on. The complete step-by-step instructions are available in the KB article linked above.
Warning: Please be aware that turning this setting off also allows actual "split-brain" issues to develop since the system won't be detecting them any more.
Important Notes:
This issue is distinct from actual host failures and specifically relates to the HA duplicate VM detection mechanism.