Encrypted Virtual Machines enter VM_STATE_LOCKED on Power Cycle after a HA Failover when the failed Host is removed from the Cluster.
search cancel

Encrypted Virtual Machines enter VM_STATE_LOCKED on Power Cycle after a HA Failover when the failed Host is removed from the Cluster.

book

Article ID: 334912

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Encrypted Virtual Machines enter VM_STATE_LOCKED after a Power Cycle.
  • Before the Power Cycle the affected VMs had been moved from another Host by a HA Failover.
  • The Host which suffered the HA Failover has been removed from the Cluster.


Environment

VMware vSphere ESXi 6.7

Cause

The key_usage count for each host is recorded in the vCenter Database.

For each VM that is using a key, the key_usage count of that key is increased by 1 for the Host where the VM is located.

When a HA failover happens the VMs are restarted by HA on other Hosts within the HA cluster.

But in vSphere versions prior to 7.0 Update 1 this does not update the key_usage count to reflect the new Hosts for those VMs.

So if the failed Host is removed from the Cluster the key_usage will be reduced for all keys "in use" by that Host and keys with zero usage will be removed from all Hosts in the cluster.

This will cause the VMs whose keys are removed to enter a LOCKED state on a Power Cycle.

Resolution


In vCenter 7.0 Update 1 we have added functionality to vSphere HA to update the key reference count when VMs are moved to a new Host by HA.

Locked VMs can be unlocked from vCenter.

https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-CB459722-C7B6-4EA3-B8D3-EB44BCF23077.html

Or the PowerCLI cmdlet Unlock-VM can be used to unlock the VMs.

https://developer.vmware.com/docs/powercli/latest/vmware.vimautomation.security/commands/unlock-vm/#Default

Workaround:
In vSphere 6.x to avoid the VMs becoming LOCKED you can re-key the VMs which are failed over by HA before removing the failed Host from the Cluster.

1) Identify the host which has crashed or become unresponsive and needs to be removed from the cluster. EG 'examplehost1.example.com'.

2) Open a console or SSH to the vCSA as root and query the database for the associated key_ids which are owned by the host.

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB

VCDB=# select usage_count,dns_name, host_id, vpx_host_crypto_keys.crypto_key_id from vpx_host_crypto_keys inner join vpx_host on ( vpx_host_crypto_keys.host_id = vpx_host.id) where vpx_host.dns_name = 'examplehost1.example.com';

This will output all associated keys and their usage_count.

usage_count | dns_name | host_id | crypto_key_id

-------------+---------------------------+---------+--------------------------------------------------------------

           0 | examplehost1.example.com | 6521 | 0F4xxxxxxxxxxxxx952F

           0 | examplehost1.example.com | 6521 | 404CxxxxxxxxxxxxF963

           1 | examplehost1.example.com | 6521 | 6AxxxxxxxxxxxxxxxxD6

 
3) If all keys have usage_count = 0 then it is safe to skip the rest of the steps and remove the Host from the Cluster.
 

4) If there are keys with usage_count greater than 0 then you must identify the VMs which are using them.

Using PowerCLI:

PS C:\Users\xzy> connect-viserver <impacted vCenter name>

PS C:\Users\xzy> get-cluster <impacted cluster name> |get-vm |? {$_.ExtensionData.Config.keyID.keyid -eq '6AxxxxxxxxxxxxxxxxD6'}

Name PowerState Num CPUs MemoryGB

---- ---------- -------- --------

enc_test_vm PoweredOff 1 2.000


5) Rekey the impacted VMs

PS C:\Users\xzy> $kp = Get-keyProvider kms_cluster_id
PS C:\Users\xzy> get-vm -name enc_test_vm | set-vm -keyprovider $kp

After that, you can check to see the new keyid of the VM to confirm that it is rekeyed.

PS C:\Users\xzy> Get-SecurityInfo -Entity enc_test_vm

You can also query the usage_count again in VCDB to confirm that examplehost1.example.com has 0 usage count.

 
6) Remove the host from inventory/cluster

Additional Information

https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-CB459722-C7B6-4EA3-B8D3-EB44BCF23077.html

https://developer.vmware.com/docs/powercli/latest/vmware.vimautomation.security/commands/unlock-vm/#Default