Troubleshooting VMware Aria Automation Orchestrator 8.x application start issues
search cancel

Troubleshooting VMware Aria Automation Orchestrator 8.x application start issues

book

Article ID: 322724

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • You are attempting to start VMware Aria Automation or Aria Automation Orchestrator 8.x with deploy.sh but the vco-server-app fails to start.
    • When running kubectl get pods -n prelude you see the vco-app with a large number of restarts.
    • When reviewing the vco-app logs, the start process appears to simple restart.


Environment

VMware Aria Automation Orchestrator 8.x
VMware Aria Automation 8.x

Resolution

See the Workaround below for additional information.

Workaround:

Troubleshooting Steps:

Procedure: Increasing Kubernetes health probe timeouts

  • The default is 10 seconds of earlier versions of VMware Aria Automation Orchestrator and can be increased to 30 seconds or higher.
  • Set the values for initialDelaySeconds, periodSeconds, and failureThreshold similar to:
    initialDelaySeconds: 180
    timeoutSeconds: 10
    periodSeconds: 30
    successThreshold: 1
    failureThreshold: 20
Note: Improvements to these values have been introduced in 8.12.x. Do not implement these instructions on versions higher than 8.12.x and above.
  1. SSH into the Automation Orchestrator 8.x appliance.
  2. Use vi or vim to edit /opt/charts/vco/templates/deployment.yaml on each node in the cluster
  3. Edit the section for the vco-server-app container liveness and readiness probes, an example is below:
            livenessProbe:
              failureThreshold: 3
              httpGet:
                path: /vco/api/health/liveness
                port: 8280
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 30
            name: vco-server-app
            ports:
            - containerPort: 8280
              protocol: TCP
            readinessProbe:
              failureThreshold: 3
              httpGet:
                path: /vco/api/health/readiness
                port: 8280
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 30
  4. Attempt to restart services by running:
    /opt/scripts/deploy.sh
Note: To decrease the load on these timeouts:
  • Ensure all web client tabs connecting to Automation Orchestrator are closed.
  • If running on a single node, scale to a 3-node cluster.
  • Upgrade to the latest version as this behavior is improved upon.

Analyzing *.hprof files (Java heap dumps from Automation Orchestrator)

Symptoms

  • *.hprof files fill up a large amount of disk space.
  • After deleting the files filling up the disk, a new *.hprof file is generated and the process repeats before Automation Orchestrator services are able to be logged into by a user.

Cause

  • Custom workflows and actions maybe consuming too much java heap for the application to keep up, causing memory to be written to disk in *.hprof format, crashing the Orchestrator service.

Workaround

  • Contact your workflow Developer. The following instructions are considered a development task when writing workflows for Automation Orchestrator.
  1. Download VisualVM to a system external to the Automation Orchestrator appliance.
  2. Extract the zip file.
  3. Generate a heap dump by following instructions found in Error! 500 when attempting to generate a heap dump in Aria Orchestrator control center interface.
  4. Copy the *.hprof file to a location accessible by this external system.
  5. Start VisualVM. The executable is located in the ./bin directory.
  6. In the VisualVM explorer window, right-click on the Heap Dumps node and choose Load Heap Dump.
  7. Navigate to the location of your *.hprof file, select it, and click Open.
  8. Once the heap dump is loaded, it will appear as a node under Heap Dumps. Click on it to analyze the heap dump.
    1. Triage the issue by isolating threads or workflows consuming a large amount of memory.
      1. Refactor your code to be considerate of Java heap.
    2. If the issue persists, try enabling Safe Mode by setting ch.dunes.safe-mode = true in Control Center under System Properties.
      1. Monitor for the service to restart then try accessing Automation Orchestrator again.


Additional Information

Impact/Risks:
VMware Aria Automation or Automation Orchestrator fails to properly boot. Workflows will fail to run until this is resolved.