SRM Recovery Plan Test Warns for Encrypted Virtual Machines
search cancel

SRM Recovery Plan Test Warns for Encrypted Virtual Machines

book

Article ID: 418103

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

During a VMware Site Recovery Manager (SRM) recovery plan test, certain virtual machines (VMs) that utilize VM Encryption (potentially including vTPM) on the protected site generate warnings or fail to complete the recovery process on the recovery site. The warnings indicate an inability to update virtual machine configuration files and an absence of required encryption keys.

This issue manifests with messages such as:

  • "Failed to update embedded paths in virtual machine file '/vmfs/volumes/…/VMname/VMname.vmx'. Invalid virtual machine configuration."
  • "The dictionary is encrypted and the required key is not available."
  • "Failed to resolve key… with Trusted Key Provider."
  • "Trust Authority Components not configured."



Warning during Recovery Plan test:

"Failed to update embedded paths in virtual machine file '/vmfs/volumes/.../VMname/VMname.vmx'. Invalid virtual machine configuration."

From the recovery SRM appliance, we see these messages for the VMs:

-->       vmFile = "/vmfs/volumes/.../VMname/VMname.vmx",
-->       fault = (vim.fault.InvalidVmConfig) {
-->          faultCause = (vmodl.MethodFault) null,
-->          faultMessage = <unset>,
-->          property = "snapshot.dict"
-->          msg = "Invalid virtual machine configuration."


From the recovery host, we see this in hostd.log:

2025-10-29T14:27:25.778Z In(166) Hostd[2099670]: [Originator@6876 sub=Libs opID=d7c7df28-6dd6-47cf-abd4-############-test:d157:c531:d4f9:06bd:ccee:7dbf:6583-54-01-5e-41c0 sid=52c9db04 user=vpxuser:VSPHERE.LOCAL\SRM-40ca508c-b09d-49fa-a246--############] [msg.dictionary.unlock.noKey] The dictionary is encrypted and the required key is not available."

kmxa.log on the recovery host:

2025-10-29T15:34:32.162Z Er(163) kmxa[2099106]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-56] Failed to resolve key 7fab1d9ead6c4e118720bc505081a33629e9ece2423a4987b1e64b0700bd46bc/prod_2931_skp_gen_01 with Trusted Key Provider.
2025-10-29T15:34:32.485Z Er(163) kmxa[2099102]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-57] Trust Authority Components not configured.
2025-10-29T15:34:32.485Z Er(163) kmxa[2099102]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-57] Failed to decrypt key 03289668c891496fa1d3c26af0d5eba03e480156f65546278a770a0a5a7b27a5/prod_2931_skp_gen_01: Error:
2025-10-29T15:34:32.485Z Er(163) kmxa[2099102]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-57]    com.vmware.vapi.std.errors.error
2025-10-29T15:34:32.485Z Er(163) kmxa[2099102]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-57] Messages:
2025-10-29T15:34:32.485Z Er(163) kmxa[2099102]: [Originator@6876 sub=Libs opID=resolveKey-52420e9f-63d5-d04b-12b4-############-57]    com.vmware.esx.trusted_infrastructure.trust_authority_services.not_configured<Incomplete or missing Trust Authority Components configuration.

Environment

  • VMware Live Recovery deployed and configured
  • Virtual Machines with VM Encryption enabled

Cause

The primary cause of this issue is the inability of the recovery ESXi hosts to access, resolve, or decrypt the necessary encryption keys for the affected VMs' configuration files (specifically the .vmx file and its embedded snapshot.dict property). During an SRM test recovery, SRM performs a simulation that may not fully engage all the key management mechanisms required for encrypted VMs in the same way a full failover does.

Resolution

The resolution involves thoroughly verifying and, if necessary, reconfiguring the Key Management System (KMS) and Trust Authority components on the recovery site, followed by a controlled test.

  1. Verify KMS and Trust Authority Components Configuration on the Recovery Site:

    • Confirm that VM Encryption is intentionally enabled for these VMs on the protected site.
    • Ensure a Key Management System (KMS) is deployed, operational, and accessible on the recovery site. This includes checking network connectivity (firewall rules, routing) between recovery ESXi hosts and the KMS.
    • Verify that the recovery ESXi hosts are correctly configured to trust and communicate with the KMS. This typically involves:
      • Navigating to vCenter Server -> Cluster -> Configure -> Key Management Servers on the recovery site.
      • Ensuring the KMS is registered and its status is "Normal."
      • Confirming that the ESXi hosts within the recovery cluster are properly associated with the KMS.
    • Crucially, check if the Trusted Authority Components are correctly configured on the recovery ESXi hosts, as indicated by the kmxa.log errors. This is vital for environments using vSphere Native Key Provider or external KMS requiring specific trust chains. Consult VMware documentation for your specific vSphere and KMS version for detailed configuration steps.
    • If any KMS or Trust Authority component is misconfigured or inaccessible, correct these issues.
  2. Understand vTPM vs. VM Encryption:

    • While vTPM provides a virtual hardware TPM for the guest OS, its state file is often secured by VM Encryption. The errors (especially snapshot.dict and key resolution) point to issues with the underlying VM Encryption mechanism itself, which relies on the KMS. Focus troubleshooting on the KMS setup rather than solely on vTPM functionality.
  3. Perform another SRM Recovery Plan Test:

    • After verifying and correcting any KMS/Trust Authority configuration issues, run the SRM recovery plan test again to see if the warnings persist.
  4. Consider a Controlled Full Failover (Advanced/Cautionary Step):

    • SRM test recoveries are non-disruptive but may not fully exercise all encrypted VM recovery mechanisms. A full failover performs more comprehensive actions, which might include re-registering the encrypted VM with the recovery site's KMS or even re-encrypting the VM on the target site.
    • If possible and acceptable within your environment (e.g., during a maintenance window with a non-critical VM and a clear rollback plan), attempt a real (non-test) failover for just one of the affected VMs. This can definitively determine if the issue is specific to test recovery limitations or a fundamental problem that would affect a real disaster.
    • Only attempt this if you are comfortable with the potential downtime and have a well-rehearsed rollback strategy. If a full failover succeeds without these warnings, it suggests the issues are specific to the test recovery phase and can potentially be tolerated for test runs, provided the KMS is confirmed fully operational on the recovery site.

Additional Information

  • The difference in behavior between SRM test recoveries and actual failovers for encrypted VMs is a known characteristic. Test recoveries prioritize non-disruptiveness, while full failovers execute the complete recovery workflow, which may include more robust handling of encryption keys and VM registration with the target KMS.
  • The kmxa component is critical for ESXi hosts to interact with Key Management Servers. Any errors in its logs related to key resolution or trust authority indicate a core problem with the host's ability to handle encrypted workloads.
  • Always consult VMware documentation specific to your vSphere and SRM versions for the most up-to-date best practices for VM Encryption and KMS integration.