OSAM migrations fail with "Failed to reserve storage" after Service Data Replicator (SDR) reaches disk capacity limit
search cancel

OSAM migrations fail with "Failed to reserve storage" after Service Data Replicator (SDR) reaches disk capacity limit

book

Article ID: 434475

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

During OS-Assisted Migration (OSAM), multiple virtual machine migrations fail with the error "Failed to reserve storage" during initial sync. The target datastore appears to have adequate free space, and migrating to an alternate datastore does not resolve the issue.

Review the HCX Manager logs at /common/logs/admin/app.log. When SDR saturation is the underlying cause, parking messages appear before the storage reservation errors. Entries similar to the following are present:

[OsAssistedReplicationService_SvcThread-####, Ent: HybridityAdmin, , TxId: ######-####-####-####-######] INFO c.v.v.h.s.o.jobs.StartReplicationJob- [transferId:'######-####-####-####-######' sid:'######-####-####-####-######' jobId:'######-####-####-####-######' hostname:'ExampleVM'] Parking Transfer: SDR at full capacity

Subsequent or concurrent migrations then fail with:

[OsAssistedReplicationService_SvcThread-####, Ent: HybridityAdmin, , TxId: ] ERROR c.v.v.h.s.o.jobs.FailTransferJob-########### [transferId:############'jobId:############' hostname:'ExampleVM'] Failed to reserve storage

The "Parking Transfer: SDR at full capacity" messages indicate the SDR has reached its disk limit. The "Failed to reserve storage" errors follow as a downstream symptom because parked jobs continue to hold active storage reservations on the target datastore until they time out.

This scenario differs from a standard "Failed to reserve storage" failure where the datastore genuinely lacks free space or has too many concurrent migrations targeted to it. If adequate datastore capacity exists and migrating to an alternate datastore does not resolve the error, check for the SDR parking messages described above.

Environment

VMware HCX 4.11

Cause

Each Service Mesh SDR in HCX 4.11 supports a maximum of 50 disks. When this limit is reached, new OSAM replication jobs are parked in the queue waiting for SDR capacity to free up. While parked, these jobs maintain active storage reservations on the target datastore. As the reservations time out, the migration fails with "Failed to reserve storage."

The storage reservation error is not caused by insufficient datastore space. It is a downstream result of SDR saturation preventing parked jobs from progressing, while their held reservations block subsequent jobs from reserving space.

Resolution

  1. Deploy a new Service Mesh within the same environment to add additional SDR capacity.
  2. For the affected VMs, uninstall the existing Sentinel Agent.
  3. Reinstall the Sentinel Agent using the installer from the new Service Mesh. This registers the VM with the new Service Mesh and allows it to use the available SDR capacity.
  4. Retry the failed migrations.

For future migration planning, account for the 50-disk-per-SDR limit and distribute workloads across multiple Service Meshes proactively to avoid SDR saturation. Refer to the HCX 4.11 Configuration Maximums for current limits.

If the error persists after following these steps, contact Broadcom Support for further assistance.

Additional Information