HCX migrations stuck in "Waiting for maintenance window" for more than 30 days
search cancel

HCX migrations stuck in "Waiting for maintenance window" for more than 30 days

book

Article ID: 429721

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • The VMs are not migrating and are getting stuck on "Waiting for maintenance window" even after the "Schedule Now" option is selected.

  • Executing Check_mig_tracker_issue.sql script shows some migration entries that are stuck.

Environment

  • VMware HCX 4.11.x

Cause

  • A purging policy removes the migration tracker after 30 days(from migration start time).

  • If a restart of the HCX Manager or services after the purging policy is run, then the migration workflow would be unable to identify the migration tracker and fail to revive.

  • This results in migration being stuck on waiting for maintenance window.

Resolution

This issue is resolved in VMware HCX 4.11.4,  available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Refer >> VMware HCX 4.11.4 Release Notes

Note: If you already have an existing migration that is progressing even after 30 days of instantiation, please do not restart services or HCX Manager. In any case, try to switchover soon.

Workaround

  • Execute the Patch ‘patch_jobcontrol_expiration.sh’ on both the Managers.
  • Execute the ‘check_mig_tracker_issue.sql’ script On Source HCX Manager only
  • Cancel the affected migrations (Output of 'check_mig_tracker_issue.sql' script ) from the migration wizard
  • Select the cancelled migrations and click on FORCE CANCEL. For more information: Force Cleanup for a Failed or Canceled Migration
  • Re-trigger the migration

 

Patch execution steps

Note: 

  • Execute the Patch ‘patch_jobcontrol_expiration.sh’  on both the HCX manager Appliances.
  • If you already have an existing migration that is progressing even after 30 days of instantiation, please do not restart services or HCX Manager. In any case, try to switchover soon.
  • Please execute the script only during scheduled maintenance window.
  • If you believe you hit this issue, execute the script only after the remediation.
  • You can still apply the Patch even if you have not hit this issue, provided there should be no active migrations or Network Extension or Un-Extension processes.
  1. SSH into the HCX Manager appliance
  2. Switch to root
  3. Transfer the script to the HCX Manager
  4. Make the script executable
    bash
    chmod +x <Path_to_file>/patch_jobcontrol_expiration.sh
  5. Run the script
    Command:
    bash
    bash <Path_to_file>/patch_jobcontrol_expiration.sh
  6. Verify the output
    A successful run produces output similar to:
    [INFO] Stopping app-engine...
    [INFO] app-engine stopped successfully
    [INFO] Backup created at /home/admin/DataCleanupService.zql.bak.20260217_143025
    [SUCCESS] JobControl expirationPeriodHours updated from 720 to 4320 in /opt/vmware/deploy/zookeeper/DataCleanupService.zql
    [INFO] Starting app-engine...
    [INFO] app-engine started successfully
  7. Confirm the change manually
    Command:
    bash
    grep -A1 '"collection": "JobControl"' /opt/vmware/deploy/zookeeper/DataCleanupService.zql
  8. Output:
    "collection": "JobControl",
    "expirationPeriodHours": 4320

Exit codes and their meaning:

| Code | Meaning |
|------|---------|
| 0    | Success |
| 1    | ZQL file not found |
| 2    | JobControl entry not found in file |
| 3    | sed command failed |
| 5    | Backup creation failed |
| 6    | Failed to stop app-engine |
| 7    | Failed to start app-engine |

 

Rollback of the patch:

  1. sudo su -
  2. systemctl stop app-engine
  3. cp /home/admin/DataCleanupService.zql.bak.<TIMESTAMP> /opt/vmware/deploy/zookeeper/DataCleanupService.zql
  4. systemctl start app-engine

 

Check_mig_tracker_issue.sql script execution steps:

  1. SSH into the HCX Manager appliance
  2. Switch to root
  3. Transfer the ‘check_mig_tracker_issue.sql’ script to the HCX Manager
  4. Make the script executable
    bash
    chmod +x <Path_to_file>/Check_mig_tracker_issue.sql
  5. Run the script
    Command:
    psql hybridity -f <Path_to_file>/check_mig_tracker_issue.sql
  6. Verify the output
    A successful run produces output similar to:
    migration_group_id          |             migration_id             |            tracker_job_id            |     tracker_state     | tracker_concluded |  tracker_creation_date  |   tracker_last_updated   |            parent_job_id             | parent_job_type | parent_workflow_type | parent_state | parent_previous_state | parent_concluded | parent_last_updated |            flag             | parent_missing 
    --------------------------------------+--------------------------------------+--------------------------------------+-----------------------+-------------
    ------+-------------------------+--------------------------+--------------------------------------+-----------------+----------------------+--------------
    +-----------------------+------------------+---------------------+-----------------------------+----------------
     ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
          | 2025-08-14 14:53:27.925 | 2025-12-16T10:26:21.034Z |  ####-####-####-#### |                 |                      |              
    |                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t
      ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
          | 2025-11-03 17:09:14.052 | 2025-12-16T10:26:27.394Z |  ####-####-####-####X |                 |                      |              
    |                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t

 



Attachments

check_mig_tracker_issue.sql get_app
patch_jobcontrol_expiration.sh get_app