RAV Migration Fails for high churn VM with Error: "Not able to get group instance post snapshot"

search cancel

RAV Migration Fails for high churn VM with Error: "Not able to get group instance post snapshot"

book

Article ID: 403533

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When attempted to migrate a VM using RAV [Replication Assisted vMotion] and encountered repeated failures with the following error: "Not able to get group instance post snapshot."
Below error would be noticed on HCX UI :

Below logs would be noticed in app.log under /common/logs/admin/ showing high data churn :

<timestamps> UTC [ReplicationTransferService_SvcThread-188921, Ent: HybridityAdmin, , TxId: ####-####-53753504ad43] INFO  c.v.h.s.r.utils.ReplicationUtil- The Virtual Machine '<vm-name>' corresponding to the transfer '####-####-e10227f3661d' is high churn VM: Detected data churn 87902 KBps is overshooting available bandwidth 25000 KBps
<timestamps> UTC [ReplicationTransferService_SvcThread-189014, Ent: HybridityAdmin, , TxId: ####-####-53753504ad43] INFO  c.v.h.s.r.utils.ReplicationUtil- The Virtual Machine '<vm-name>' corresponding to the transfer '####-####-e10227f3661d' is high churn VM: Detected data churn 95482 KBps is overshooting available bandwidth 25000 KBps

Environment

VMware HCX
IO Intensive VMs

Cause

The failure occurs during the switchover phase due to the online sync process running too long (exceeding 5 hours) without completing. As per the current implementation, if the sync does not produce a group instance after 5 retry attempts, the migration is designed to fail.
This issue is linked to high data churn on the VM, where data is written faster than it can be replicated within the RPO cycle.

Resolution

If zero downtime is not a strict requirement, it is recommended to use Bulk migration for high churn VMs. This approach is more tolerant of heavy write activity and does not rely on continuous replication within RPO limits.

Additional Information

High data churn means the VM is constantly changing data (writes/deletes/modifies), making it difficult to keep replication in sync.
This behavior is by design in the current product version.
Manual process to review the delta sync progress
HCX UI reports an alert related to the "High data churn" during RAV delta synchronization

Feedback

thumb_up Yes

thumb_down No