VMware Aria Automation Orchestrator upgrade to v8.18.1U4 failed due to stale kube-system pods
search cancel

VMware Aria Automation Orchestrator upgrade to v8.18.1U4 failed due to stale kube-system pods

book

Article ID: 431662

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

When attempting to upgrade VMware Aria Automation Orchestrator from v8.17.0 to v8.18.1U4, the upgrade fails during the preparation phase with a "Preparation Error." The system remains unaffected, but the upgrade cannot proceed.

When reviewing the on-screen upgrade report or the prelude logs (/var/log/vmware/prelude), you will see errors similar to the following indicating that pods are not reaching a ready state:

[ERROR][2026-01-30 03:38:09][vROFQDN] Pod: vco-app-#######-##### is not in Ready or Completed state. All pods must be in either of these states.
[ERROR][2026-01-30 03:38:09][vROFQDN] Services verification found errors.

Environment

  • VMware Aria Automation Orchestrator 8.17.0

  • VMware Aria Automation Orchestrator 8.18.1U4

Cause

This issue occurs because the kube-system namespace pods have been running for an extended period (greater than 30 days) and have become stale. The pre-upgrade health checks require all pods, including vco-app pods, to be in a "Ready" or "Completed" state. Stale core Kubernetes system pods prevent proper communication and initialization, causing the vco-app pods to fail the readiness verification.

Resolution

To resolve this issue, you must refresh the kube-system pods and clear out the failed upgrade files before retrying the upgrade.

  1. Revert the environment back to the snapshot taken prior to the first upgrade attempt.

  2. Connect to the VMware Aria Automation Orchestrator appliance via SSH.

  3. Stop the services in the environment by running the following command:

    /opt/scripts/deploy.sh --onlyClean
    
  4. Verify the age of the kube-system pods:

    kubectl get pods -n kube-system
    
  5. If the pod AGE is greater than 30 days, restart the kube-system namespace pods:

    kubectl delete pod -n kube-system --all
    
  6. Redeploy the platform scripts to initialize the refreshed pods:

    /opt/scripts/deploy.sh
    
  7. Perform a cleanup of the prelude upgrade files from the previous failed attempt and clear the associated cron jobs:

    vracli cluster exec -- bash -c 'rm -rf /data/restorepoint /var/vmware/prelude/upgrade /var/log/vmware/prelude/upgrade-report-latest*; crontab -u root -l | grep -v -F "/opt/scripts/upgrade/upg-mon.sh" | crontab -u root -'
    
  8. Start the environment services again:

    /opt/scripts/deploy.sh
    
  9. Reattempt the upgrade to v8.18.1U4.