VMware Live Recovery: Active VM in Protected Site shows as placeholder in vCenter

Products

VMware Live Recovery VMware Site Recovery Manager 8.x

Issue/Introduction

Symptoms:

After a failover and failback have been done previously, virtual machines in the Production Site/Real virtual machines are reported as managed by com.vmware.vcDr or VMware vCenter Site Recovery Manager Extension in the Summary tab.

Environment

VMware Live Recovery 9.x
VMware Live Recovery 8.x

Cause

Site Recovery Manager uses the vm.config.managedBy property to claim certain virtual machines as placeholder virtual machines on the failover vCenter Server site. At the time of actual failover, these virtual machines become real (production) virtual machines.

To do this, SRM invokes a vm.reconfigure operation on the placeholder virtual machines to clean up the vm.config.managedBy property. vCenter Server cleans up this property from its cache, but it is not cleared from the vCenter Server database.

If vCenter Server restarts for any reason or if there is a VM reconfiguration event (power ON/OFF), it reads the VPX_VM table and the vm.summary.config.managedBy property is populated with the old managedBy value, which incorrectly declares the real virtual machines as placeholder virtual machines.

Resolution

Note: Before making any changes, ensure you take a proper backup or snapshot of vCenter.

For a standalone vCenter, create a snapshot without memory.
For a vCenter in Enhanced Linked Mode (ELM), an offline snapshot is strongly recommended.

When multiple vCenters are part of the same Single Sign-On (SSO) domain (Enhanced Linked Mode), failing to take offline snapshots on all nodes beforehand can lead to corruption of the vmdird database.

Fix #1 :

Identify the VM ID
- In vCenter, click on the impacted VM from the inventory.
- Check the browser URL; the VM ID will appear in the format vm-XXX.
Access the Managed Object Browser (MOB)
- Open a new browser tab and enter the following URL: https://VC-FQDN-OR-IP/mob/?moid=vm-XXXXX&method=reconfigure
  - Replace <VC-FQDN-or-IP> with your vCenter FQDN or IP address.
  - Replace XXXXX with the VM ID you noted in step 1.
  - When prompted, log in using the vCenter SSO administrator account ([email protected]).
Reconfigure the VM
- In the page that opens, locate the value field.
- Delete the existing contents.
- Paste the following XML code into the field:
```
<spec>
  <managedBy>
    <extensionKey></extensionKey>
    <type></type>
  </managedBy>
</spec>
```
- Click the Invoke Method link to apply the changes.
Verify
- Refresh the page in the browser to confirm the changes are reflected.
Recreate the Placeholder VM
- - In the DR site vCenter inventory, delete the existing placeholder VM.
  - Open the SRM UI, navigate to the impacted VM under replication.
  - Click on Recreate Placeholder to regenerate the placeholder VM.

Fix #2:

Connect to vCenter
- Log in to the vCenter Server via SSH, where the impacted VM resides.
Identify the VM ID
- Obtain the VM ID from the URL as described in Fix #1.
Verify VM details in the database
- Run the following SQL query against the vPostgres database, replacing XXXXX with the VM ID from step 2:
  - ```
  echo "select id,file_name, managed_by_ext_key, managed_by_type from vpx_vm where id=XXXXX;" | /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
```
- Review the output to confirm the VM entry.
Stop vCenter services
- If the vCenter is standalone, stop only the vpxd service:
  - ```
  service-control --stop vpxd
```
- If the vCenter is part of Enhanced Linked Mode (ELM), stop both the vpxd and vmdird services on all vCenters in the ELM domain to prevent replication synchronization:
  - ```
  service-control --stop vpxd && service-control --stop vmdird
```
Update the VM record in the database
- Run the following SQL query to clear the values from the managed_by_ext_key and managed_by_type fields for the impacted VM, replacing XXXXX with the VM ID:
  - ```
  echo "UPDATE vpx_vm SET managed_by_type = NULL, managed_by_ext_key = NULL WHERE id = XXXXX;" | /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
```
Restart vCenter services
- Start the services that were stopped in step 4.
Recreate the placeholder VM
- Follow Fix #1, Step 5 to delete and recreate the placeholder VM from the SRM UI.
Verify
- Refresh the browser page to confirm that the changes are successfully applied.