Run the following commands on each ESXi host when a large number of VMs are facing this issue in a vSAN Stretched Cluster, or when vCenter resides as a VM on the vSAN Datastore and cannot be accessed due to aforementioned IP address issue:
- SSH to the host and login as root
- Run the esxcli vm process list command to get a list of running virtual machines in the host:
Example:
$ esxcli vm process list
vm1
World ID: 1001723832
Process ID: 0
VMX Cartel ID: 1001723827
UUID: 42 29 08 44 ## ## ## ##-## ## ## ## ee 4a 66 bf
Display Name: vm1
Config File: /vmfs/volumes/vsan:527b71e8########-######3d219c68b8/########-####-####-####-########20d4/vm1.vmx
- Run the esxcli network nic list command to get the list of MAC addresses of the physical network interfaces in this host:
Example:
$ esxcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------ ------------ -------- ------------ ----------- ----- ------ ----------------- ---- -----------------------------------------------
vmnic0 0000:0b:00.0 nvmxnet3 Up Up 10000 Full ##:##:##:##:##:9e 1500 VMware Inc. vmxnet3 Virtual Ethernet Controller
vmnic1 0000:13:00.0 nvmxnet3 Up Up 10000 Full ##:##:##:##:##:4d 1500 VMware Inc. vmxnet3 Virtual Ethernet Controller
vmnic2 0000:1b:00.0 nvmxnet3 Up Up 10000 Full ##:##:##:##:##:15 1500 VMware Inc. vmxnet3 Virtual Ethernet Controller
vmnic3 0000:04:00.0 nvmxnet3 Up Up 10000 Full ##:##:##:##:##:db 1500 VMware Inc. vmxnet3 Virtual Ethernet Controller
- Run vmfsfilelockinfo command for each VM with its VMX file path, to find out which MAC address is owning the lock.
Example:
$ /bin/vmfsfilelockinfo -p /vmfs/volumes/vsan:527b71e8########-######3d219c68b8/########-####-####-####-########20d4/vm1.vmx
vmfsfilelockinfo Version 2.0
Looking for lock owners on "vm1.vmx"
"vm1.vmx" is locked in Exclusive mode by host having mac address ['##:##:##:##:##:15']
Please configure ESXi firewall to connect to Virtual Center
Total time taken : 1.0551715530455112 seconds.
Note: If the MAC address is owned by local host, it means the running virtual machine still owns its lock; otherwise it loses the lock so it is okay to terminate the virtual machine. vSphere HA might have already started the virtual machine in other host. Otherwise, after the virtual machine is terminated, HA will try to restart it.
You may run below shell script in ESXi host to orchestrate above steps:
esxcli network nic list > /tmp/mac.list
esxcli vm process list > /tmp/vm.list
while IFS= read -r line; do
if echo $line | grep -v ":" > /dev/null; then
echo "Checking VM: $line"
elif echo $line |grep "World ID:" > /dev/null; then
VM_WLD_ID=$(echo $line |grep -o "[0-9]*")
elif echo $line | grep "Config File:" > /dev/null; then
VMX_FILE=$(echo $line |grep -o "/vmfs/.*")
LOCKING_MAC=$(/bin/vmfsfilelockinfo -p $VMX_FILE |grep "mac address")
STRIP_MAC=$(echo $LOCKING_MAC | grep -o "\[.*\]" |grep -o "[0-9a-f:]*")
grep $STRIP_MAC /tmp/mac.list > /dev/null
if [ $? -ne 0 ]; then
echo " Error: VM does not hold the lock. You may run this command to terminate the VM:"
echo " esxcli vm process kill -t=hard -w=$VM_WLD_ID"
fi
fi
done < /tmp/vm.list
Example:
$ ./lost_lock_vm.sh
Checking VM: vm1
Error: VM does not hold the lock. You may run this command to terminate the VM:
esxcli vm process kill --type=hard --w=1001723832