vMotion fails intermittently with below error:
Failed to receive migration. The source detected that the destination failed to resume.
An error occurred restoring the virtual machine state during migration. NamespaceMgr could not lock the db file.
The destination ESXi host vmware.log
shows the following messages:
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.checkpoint.migration.failedReceive] Failed to receive migration.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.namespaceMgr.noLock] NamespaceMgr could not lock the db file.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.checkpoint.mrestoregroup.failed] An error occurred restoring the virtual machine state during migration.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - Msg_Post: Error
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.checkpoint.mrestoregroup.failed] An error occurred restoring the virtual machine state during migration.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.namespaceMgr.noLock] NamespaceMgr could not lock the db file.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - [msg.checkpoint.migration.failedReceive] Failed to receive migration.
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - ----------------------------------------
[YYYY-MM-DDTHH:MM:SS] In(05) vmx - Module 'CheckpointLate' power on failed.
Note: To identify the correct vmware.log
on the destination host, see Locating virtual machine log files on an ESXi host.
The issue only impacts VM's provisioned on shared VMFS datastores, vSAN and VVOL that use a namespace DB. This issue does not impact VMs on NFS datastores.
A namespace DB will be used under certain environment conditions such as SDMP from vROps, guest introspection for NSX, etc..
VMware ESXi 8.0 U2b or later(23305546)
VMware ESXi 7.0 U3q (23794027)
The cause of the issue is due to locking mechanism on the namespace db and has been primarily observed on VM hardware versions <= 19.
The issue has been fixed in VMware ESXi 7.0 U3r and 8.0 U3c.
[Release Notes]
Workaround
The issue is intermittent in nature and there is no reliable/permanent workaround known for this. Attempt one of the two workarounds:
The Knowledge Base article is attached with a script that checks the environments that are likely to encounter vMotion failure with error "NamespaceMgr could not lock the db file".
NOTE:
The ESXi host version check performed by this script may not provide reliable information for hot patches. Therefore, impacted version verification with Broadcom support is recommended for hot patches.
Steps to Execute the script
1. This script is tested on the following Microsoft PowerShell version and VMware PowerCLI version:
PS C:\Users\TestUser> $PSVersionTable.PSVersion
Major Minor Build Revision
----- ----- ----- --------
5 1 22621 4391
PS C:\Users\TestUser> Get-Module -Name VMware.PowerCLI -ListAvailable
Directory: C:\Users\TestUser\Documents\WindowsPowerShell\Modules
ModuleType Version Name ExportedCommands
---------- ------- ---- ----------------
Manifest 12.4.0.... VMware.PowerCLI
2. To uninstall old version of VMware PowerCLI, run:
(Get-Module -Name VMware.PowerCLI -ListAvailable).RequiredModules | Uninstall-Module -Force
Get-Module -Name VMware.PowerCLI -ListAvailable | Uninstall-Module -Force
3. To install the latest VMware PowerCLI from PowerShell repository 'PSGallery', run:
Install-Module -Name VMware.PowerCLI -Scope CurrentUser
This installs VMware PowerCLI to the current user directory $home\Documents\PowerShell\Modules, this will avoid running the script in an Admin PowerShell session.
4. You may also need to run this command before running any script:
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
5. Run this command to set Prompt if the server certificate is not trusted:
Set-PowerCLIConfiguration -Scope Session -InvalidCertificateAction Prompt
6. Run the script from its full path :- For example if the script is located in the \tmp directory the run the command as below
Command :- .\tmp\nsmgr-vmotion-precheck.ps1
7. To check an existing vCenter server for impacted ESXi hosts and VMs, run:
.\<directory location of the script>\nsmgr-vmotion-precheck.ps1
.\<directory location of the script>\nsmgr-vmotion-precheck.ps1 -CheckVMsOnly (to check the impacted VMs only)
Executing the script, will ask for the vCenter Server IP address, User name (e.g. administrator@vsphere.local) and the password credentials for the vCenter Server to validate the impacted hosts and VMs.