Synced VMs waiting for more than 30 days are not getting cutover and stuck at 'waiting for maintenance window'
search cancel

Synced VMs waiting for more than 30 days are not getting cutover and stuck at 'waiting for maintenance window'

book

Article ID: 429721

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • The VMs are not migrating and are getting stuck on "Waiting for maintenance window" even after the "Schedule Now" option is selected.

  • Executing Check_mig_tracker_issues.sql (script attached to this KB) shows some migration entries that are stuck.

  • The behavior is seen with both HCX 4.11.3 & 9.0.1.

Environment

  • VMware HCX 4.11.3
  • VMware HCX 9.0.1

Cause

  • A purging policy removes the migration tracker after 30 days (from migration start time).

  • If a restart of the HCX Manager or services after the purging policy is run, then the migration workflow would be unable to identify the migration tracker and fail to revive.

  • This results in migration being stuck on waiting for maintenance window.

Resolution

This issue is resolved in VMware HCX 4.11.4,  available at Broadcom downloads and upcoming VMware HCX 9.1 VCF release.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Refer >> VMware HCX 4.11.4 Release Notes

Note: If you already have an existing migration that is progressing even after 30 days of start, please do not restart services or HCX Manager. Attempt, the switchover soon.

If you have already hit the issue, then follow the below workaround.

  1. Identify: Execute Check_mig_tracker_issues.sql on the Source HCX Manager.
  2. Cancel: Use the migration wizard to cancel affected migrations. Refer: Canceling a Migration
  3. Force Clean: Use the "Force Cancel" option if the standard cancellation fails. Refer: Force Cleanup for a Failed or Canceled Migration
  4. Verify: Re-run the SQL script to ensure the list is clear.
  5. Patch: Apply patch_jobcontrol_expiration.sh to both managers.
  6. Re-trigger: Restart the migration workflow.

 

Patch execution steps

Note: 

  • Execute the Patch ‘patch_jobcontrol_expiration.sh’  on both the HCX manager Appliances.
  • If you already have an existing migration that is progressing even after 30 days of instantiation, please do not restart services or HCX Manager. In any case, try to switchover soon.
  • Please execute the script only during scheduled maintenance window.
  • If you believe you hit this issue, execute the script only after the remediation.
  • You can still apply the Patch even if you have not hit this issue, provided there should be no active migrations or Network Extension or Un-Extension processes.
  1. SSH into the HCX Manager appliance
  2. Switch to root
  3. Transfer the script to the HCX Manager
  4. Make the script executable
    chmod +x <Path_to_file>/patch_jobcontrol_expiration.sh
  5. Run the script
    Command:
    bash <Path_to_file>/patch_jobcontrol_expiration.sh
  6. Verify the output
    A successful run produces output similar to:
    [INFO] Stopping app-engine...
    [INFO] app-engine stopped successfully
    [INFO] Backup created at /home/admin/DataCleanupService.zql.bak.20260217_143025
    [SUCCESS] JobControl expirationPeriodHours updated from 720 to 4320 in /opt/vmware/deploy/zookeeper/DataCleanupService.zql
    [INFO] Starting app-engine...
    [INFO] app-engine started successfully
  7. Confirm the change manually
    Command:
    grep -A1 '"collection": "JobControl"' /opt/vmware/deploy/zookeeper/DataCleanupService.zql
  8. Output:
    "collection": "JobControl",
    "expirationPeriodHours": 4320

Exit codes and their meaning:

| Code | Meaning |
|------|---------|
| 0    | Success |
| 1    | ZQL file not found |
| 2    | JobControl entry not found in file |
| 3    | sed command failed |
| 5    | Backup creation failed |
| 6    | Failed to stop app-engine |
| 7    | Failed to start app-engine |

 

Rollback of the patch:

  1. sudo su -
  2. systemctl stop app-engine
  3. cp /home/admin/DataCleanupService.zql.bak.<TIMESTAMP> /opt/vmware/deploy/zookeeper/DataCleanupService.zql
  4. systemctl start app-engine

 

Check_mig_tracker_issues.sql script execution steps:

  1. SSH into the HCX Manager appliance
  2. Switch to root
  3. Transfer the ‘Check_mig_tracker_issues.sql’ script to the HCX Manager
  4. Make the script executable
    chmod +x <Path_to_file>/Check_mig_tracker_issues.sql
  5. Run the script
    Command:
    psql hybridity -f <Path_to_file>/Check_mig_tracker_issues.sql
  6. Verify the output
    A successful run produces output similar to:
    migration_group_id          |             migration_id             |            tracker_job_id            |     tracker_state     | tracker_concluded |  tracker_creation_date  |   tracker_last_updated   |            parent_job_id             | parent_job_type | parent_workflow_type | parent_state | parent_previous_state | parent_concluded | parent_last_updated |            flag             | parent_missing 
    --------------------------------------+--------------------------------------+--------------------------------------+-----------------------+-------------
    ------+-------------------------+--------------------------+--------------------------------------+-----------------+----------------------+--------------
    +-----------------------+------------------+---------------------+-----------------------------+----------------
     ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
          | 2025-08-14 14:53:27.925 | 2025-12-16T10:26:21.034Z |  ####-####-####-#### |                 |                      |              
    |                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t
      ####-####-####-#### |  ####-####-####-#### |  ####-####-####-#### | WAIT_FOR_MAINT_WINDOW | f           
          | 2025-11-03 17:09:14.052 | 2025-12-16T10:26:27.394Z |  ####-####-####-####X |                 |                      |              
    |                       |                  |                     | PARENT_MISSING_IN_JOB_TABLE | t

 



Additional Information

VM switchover fails in HCX environment that is patched for purging policy issue

 

Attachments

Check_mig_tracker_issues.sql get_app
patch_jobcontrol_expiration.sh get_app