RAV Migration Fails for high churn VM with Error: "Not able to get group instance post snapshot"
search cancel

RAV Migration Fails for high churn VM with Error: "Not able to get group instance post snapshot"

book

Article ID: 403533

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • When attempted to migrate a VM using RAV [Replication Assisted vMotion] and encountered repeated failures with the following error: "Not able to get group instance post snapshot."
  • Below error would be noticed on HCX UI :



  • Below logs would be noticed in app.log under /common/logs/admin/ showing high data churn : 
    <timestamps> UTC [ReplicationTransferService_SvcThread-188921, Ent: HybridityAdmin, , TxId: ####-####-53753504ad43] INFO  c.v.h.s.r.utils.ReplicationUtil- The Virtual Machine '<vm-name>' corresponding to the transfer '####-####-e10227f3661d' is high churn VM: Detected data churn 87902 KBps is overshooting available bandwidth 25000 KBps
    <timestamps> UTC [ReplicationTransferService_SvcThread-189014, Ent: HybridityAdmin, , TxId: ####-####-53753504ad43] INFO  c.v.h.s.r.utils.ReplicationUtil- The Virtual Machine '<vm-name>' corresponding to the transfer '####-####-e10227f3661d' is high churn VM: Detected data churn 95482 KBps is overshooting available bandwidth 25000 KBps

Environment

VMware HCX
IO Intensive VMs

Cause

The failure occurs during the switchover phase due to the online sync process running too long (exceeding 5 hours) without completing. As per the current implementation, if the sync does not produce a group instance after 5 retry attempts, the migration is designed to fail.
This issue is linked to high data churn on the VM, where data is written faster than it can be replicated within the RPO cycle.

Resolution

If zero downtime is not a strict requirement, it is recommended to use Bulk migration for high churn VMs. This approach is more tolerant of heavy write activity and does not rely on continuous replication within RPO limits.

Additional Information