HCX migrations stuck in "Waiting for maintenance window" for more than 30 days

Products

VMware HCX

Issue/Introduction

The VMs are not migrating and are getting stuck on "Waiting for maintenance window" even after the "Schedule Now" option is selected.
Executing Check_mig_tracker_issue.sql script shows some migration entries that are stuck.

Environment

VMware HCX 4.11.x

Cause

A purging policy removes the migration tracker after 30 days(from migration start time).
If a restart of the HCX Manager or services after the purging policy is run, then the migration workflow would be unable to identify the migration tracker and fail to revive.
This results in migration being stuck on waiting for maintenance window.

Resolution

This issue is resolved in VMware HCX 4.11.4, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Refer >> VMware HCX 4.11.4 Release Notes

Note: If you already have an existing migration that is progressing even after 30 days of instantiation, please do not restart services or HCX Manager. In any case, try to switchover soon.

Workaround

Execute the Patch ‘patch_jobcontrol_expiration.sh’ on both the Managers.
Execute the ‘check_mig_tracker_issue.sql’ script On Source HCX Manager only
Cancel the affected migrations (Output of 'check_mig_tracker_issue.sql' script ) from the migration wizard
Select the cancelled migrations and click on FORCE CANCEL. For more information: Force Cleanup for a Failed or Canceled Migration
Re-trigger the migration

Patch execution steps

Note:

Execute the Patch ‘patch_jobcontrol_expiration.sh’ on both the HCX manager Appliances.
If you already have an existing migration that is progressing even after 30 days of instantiation, please do not restart services or HCX Manager. In any case, try to switchover soon.
Please execute the script only during scheduled maintenance window.
If you believe you hit this issue, execute the script only after the remediation.
You can still apply the Patch even if you have not hit this issue, provided there should be no active migrations or Network Extension or Un-Extension processes.

SSH into the HCX Manager appliance
Switch to root
Transfer the script to the HCX Manager

Make the script executable

bash
chmod +x <Path_to_file>/patch_jobcontrol_expiration.sh

Run the script
Command:

bash
bash <Path_to_file>/patch_jobcontrol_expiration.sh

Verify the output
A successful run produces output similar to:

[INFO] Stopping app-engine...
[INFO] app-engine stopped successfully
[INFO] Backup created at /home/admin/DataCleanupService.zql.bak.20260217_143025
[SUCCESS] JobControl expirationPeriodHours updated from 720 to 4320 in /opt/vmware/deploy/zookeeper/DataCleanupService.zql
[INFO] Starting app-engine...
[INFO] app-engine started successfully

Confirm the change manually
Command:

bash
grep -A1 '"collection": "JobControl"' /opt/vmware/deploy/zookeeper/DataCleanupService.zql

Output:

"collection": "JobControl",
"expirationPeriodHours": 4320

Exit codes and their meaning:

| Code | Meaning |
|------|---------|
| 0    | Success |
| 1    | ZQL file not found |
| 2    | JobControl entry not found in file |
| 3    | sed command failed |
| 5    | Backup creation failed |
| 6    | Failed to stop app-engine |
| 7    | Failed to start app-engine |

Rollback of the patch:

sudo su -
systemctl stop app-engine
cp /home/admin/DataCleanupService.zql.bak.<TIMESTAMP> /opt/vmware/deploy/zookeeper/DataCleanupService.zql
systemctl start app-engine

Check_mig_tracker_issue.sql script execution steps:

SSH into the HCX Manager appliance
Switch to root
Transfer the ‘check_mig_tracker_issue.sql’ script to the HCX Manager

Make the script executable

bash
chmod +x <Path_to_file>/Check_mig_tracker_issue.sql

Run the script
Command:

psql hybridity -f <Path_to_file>/check_mig_tracker_issue.sql

Verify the output
A successful run produces output similar to:

migration_group_id          |             migration_id             |            tracker_job_id            |     tracker_state     | tracker_concluded |  tracker_creation_date  |   tracker_last_updated   |            parent_job_id             | parent_job_type | parent_workflow_type | parent_state | parent_previous_state | parent_concluded | parent_last_updated |            flag             | parent_missing 
--------------------------------------+--------------------------------------+--------------------------------------+-----------------------+-------------
------+-------------------------+--------------------------+--------------------------------------+-----------------+----------------------+--------------
+-----------------------+------------------+---------------------+-----------------------------+----------------
 ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
      | 2025-08-14 14:53:27.925 | 2025-12-16T10:26:21.034Z |  ####-####-####-#### |                 |                      |              
|                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t
  ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
      | 2025-11-03 17:09:14.052 | 2025-12-16T10:26:27.394Z |  ####-####-####-####X |                 |                      |              
|                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t

Attachments

check_mig_tracker_issue.sql get_app

patch_jobcontrol_expiration.sh get_app