Aria Automation Pods fail to initialize due to vco-pod crash
search cancel

Aria Automation Pods fail to initialize due to vco-pod crash

book

Article ID: 416209

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Aria Automation UI Inaccessible "500 Internal Server Error"
  • Upgrade on Aria Automation may fail :
    • Aris Suite Lifecycle Request for the upgrade is stuck at stage : Error Code: LCMVRAVACONFIG90030 VMware Aria Automation VA Upgrade Status Check failed.
    • Aria Automation Appliances have been upgrade as per the course of the upgrade workflow, however the start up post reboot is stuck. 
  • Script execution of /opt/scripts/deploy.sh fails, leaving pods stuck in initializing state: initi:0/1 or 0/2. (verify using command kubectl get pods -n prelude) 
  • Reviewing the vco-server-app.log:
    The pod logs during vco initialization showed errors similar to :
    INFO vco [host='vco-app-<pod-id>' thread='ApplicaitonEventHandler-1' user='' org='' trace=''] {} com.vmware.o11n.plugin.vsphere.connect.DefaultHostSessionFactory - Loading session 'HostSessionKey [hostname=<hostId>, username=<user_account> SdkSession]'.
    ERROR vco [host='vco-app-<pod-id>' thread='ApplicaitonEventHandler-1' user='' org='' trace=''] {} com.vmware.o11n.plugin.vsphere.inventory.DefaultSdkManagedObjectFinder - Exception during executing finder for type 'VirtualMachine' and id '<host-id>,id:vm-<id>'.
    com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound: The object 'vim.VirtualMachine:vm-<id>' has already been deleted or has not been completely created. 

Environment

Aria Automation 8.x

Aria Orchestrator 8.x

Cause

  • The VCO pod fails to initialize completely, due to having to index stale configuration elements managed by the VC plugin, as a part of the vco pod initialization.
  • This task ideally reattempts to index and realize the references held by the configuration element every 10s, goes into a loop due to failing to associate the references, due to them no longer being available. 
  • This loop eventually leads to failure of the vco-service-app pod initialization and ends up in a restart loop. 

Resolution

  • Recommendations when implementing custom profiles:
    • Avoid implementing it, in scenarios of an embedded orchestrator. 
    • Ensure that the configurations do not over commit resources.
    • Ensure to have valid snapshots in place prior to the changes.

  • Recommendations to handle consequences of stale configuration elements:
    • Ideal management of configuration element:
      • Ensure to clean up/ update configuration elements, based on the modifications of the references in the endpoint.
      • Example:
        • In the event of deletion of a VM on VC which is known to be associated as a reference on a configuration element on Orchestrator, do ensure to delete / update the reference. 

    • Recovery in case of pod initialization failures: 
      • As a temporary solution, have the vc plugin disabled prior reboot/ pod refresh and re-enable it post successful initiation: 
        • Login to orchestration-ui page as an admin
        • Navigate to `System Settings` ---> `Plug-ins`
        • Check the checkbox for the `VC` Plug-in and click `enable/disable` on the top menu and validate, the plugin is now disabled.
        • Wait for Orchestrator to restart
          Docs ref: Install, update, or delete a plug-in
      • If UI is not accessible, but the pods have been recreated, attempt to use command line to temporarily disable the plugin: vracli vro command line utility or vracli capabilities vcoin --disable. 
      • If the vracli command fails, please reach out to Broadcom Support, as the plugin would need to be disabled form the database. 
      • Once done, rebuild the pods: /opt/scripts/deploy.sh
      • Validate that the script execution completes successfully and the UI is now accessible.
      • Navigate to the Orchestrator page to identify and delete the configuration elements with stale references. 
      • Enable the VC plugin: 
        • Login to orchestration-ui page as an admin
        • Navigate to `System Settings` ---> `Plug-ins`
        • Check the checkbox for the `VC` Plug-in and click `enable/disable` on the top menu and validate, the plugin is now enabled.
        • Wait for Orchestrator to restart
          Docs ref: Install, update, or delete a plug-in
      • Wait for the pods to restart, take around 10-15 min.