Troubleshooting VMware Aria Automation Orchestrator 8.x application start issues
search cancel

Troubleshooting VMware Aria Automation Orchestrator 8.x application start issues

book

Article ID: 322724

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • You are attempting to start VMware Aria Automation or Aria Automation Orchestrator 8.x with deploy.sh but the vco-server-app fails to start.
    • When running kubectl get pods -n prelude you see the vco-app with a large number of restarts.
    • When reviewing the vco-app logs, the start process appears to simple restart.


Environment

VMware Aria Automation Orchestrator 8.x
VMware Aria Automation 8.x

Resolution

See the Workaround below for additional information.

Workaround:

Troubleshooting Steps:

Procedure: Increasing Kubernetes health probe timeouts

  • The default is 10 seconds of earlier versions of VMware Aria Automation Orchestrator and can be increased to 30 seconds or higher.
  • Set the values for initialDelaySeconds, periodSeconds, and failureThreshold similar to:
    initialDelaySeconds: 180
    timeoutSeconds: 10
    periodSeconds: 30
    successThreshold: 1
    failureThreshold: 20
Note: Improvements to these values have been introduced in 8.12.x. Do not implement these instructions on versions higher than 8.12.x and above.
  1. SSH into the Automation Orchestrator 8.x appliance.
  2. Use vi or vim to edit /opt/charts/vco/templates/deployment.yaml on each node in the cluster
  3. Edit the section for the vco-server-app container liveness and readiness probes, an example is below:
            livenessProbe:
              failureThreshold: 3
              httpGet:
                path: /vco/api/health/liveness
                port: 8280
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 30
            name: vco-server-app
            ports:
            - containerPort: 8280
              protocol: TCP
            readinessProbe:
              failureThreshold: 3
              httpGet:
                path: /vco/api/health/readiness
                port: 8280
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 30
  4. Attempt to restart services by running:
    /opt/scripts/deploy.sh
Note: To decrease the load on these timeouts:
  • Ensure all web client tabs connecting to Automation Orchestrator are closed.
  • If running on a single node, scale to a 3-node cluster.
  • Upgrade to the latest version as this behavior is improved upon.

Analyzing *.hprof files (Java heap dumps from Automation Orchestrator)

Symptoms

  • *.hprof files fill up a large amount of disk space.
  • After deleting the files filling up the disk, a new *.hprof file is generated and the process repeats before Automation Orchestrator services are able to be logged into by a user.

Cause

  • Custom workflows and actions maybe consuming too much java heap for the application to keep up, causing memory to be written to disk in *.hprof format, crashing the Orchestrator service.

Workaround

  • Contact your workflow Developer. The following instructions are considered a development task when writing workflows for Automation Orchestrator.
  1. Download VisualVM to a system external to the Automation Orchestrator appliance.
  2. Extract the zip file.
  3. Generate a heap dump by following instructions found in Error! 500 when attempting to generate a heap dump in Aria Orchestrator control center interface.
  4. Copy the *.hprof file to a location accessible by this external system.
  5. Start VisualVM. The executable is located in the ./bin directory.
  6. In the VisualVM explorer window, right-click on the Heap Dumps node and choose Load Heap Dump.
  7. Navigate to the location of your *.hprof file, select it, and click Open.
  8. Once the heap dump is loaded, it will appear as a node under Heap Dumps. Click on it to analyze the heap dump.
    1. Triage the issue by isolating threads or workflows consuming a large amount of memory.
      1. Refactor your code to be considerate of Java heap.
    2. If the issue persists, try enabling Safe Mode by setting ch.dunes.safe-mode = true in Control Center under System Properties.
      1. Monitor for the service to restart then try accessing Automation Orchestrator again.

Increase Java Heap (standalone vRO only) 

  • Scaling the heap memory of the vRealize Orchestrator Appliance is only applicable for standalone vRealize Orchestrator instances and is not supported for embedded vRealize Orchestrator instances in vRealize Automation.
  • Increase the RAM of the virtual machine on which vRealize Orchestrator is deployed up to the next suitable increment. Because it is important that enough memory is left available for the rest of the services, the vRealize Orchestrator Appliance resources must be scaled up first. For example, If the desired heap memory is 7G then the vRealize Orchestrator Appliance RAM should be increased with 4G respectively because the subtraction between the default heap value of 3G and the desired heap memory is 4G.
  1. Log in the vRealize Orchestrator Appliance command line over SSH as root.
  2. To create the custom profile directory and the required directory tree that is used when the profile is active, run the following script:
     
    vracli cluster exec -- bash -c 'base64 -d <<< IyBDcmVhdGUgY3VzdG9tIHByb2ZpbGUgZGlyZWN0b3J5Cm1rZGlyIC1wIC9ldGMvdm13YXJlLXByZWx1ZGUvcHJvZmlsZXMvY3VzdG9tLXByb2ZpbGUvCgojIENyZWF0ZSB0aGUgcmVxdWlyZWQgZGlyZWN0b3J5IHRyZWUgdGhhdCB3aWxsIGJlIHVzZWQgd2hlbiB0aGUgcHJvZmlsZSBpcyBhY3RpdmUKbWtkaXIgLXAgL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLwoKIyBDcmVhdGUgImNoZWNrIiBmaWxlIHRoYXQgaXMgYW4gZXhlY3V0YWJsZSBmaWxlIHJ1biBieSBkZXBsb3kgc2NyaXB0LgpjYXQgPDxFT0YgPiAvZXRjL3Ztd2FyZS1wcmVsdWRlL3Byb2ZpbGVzL2N1c3RvbS1wcm9maWxlL2NoZWNrCiMhL2Jpbi9iYXNoCmV4aXQgMApFT0YKY2htb2QgNzU1IC9ldGMvdm13YXJlLXByZWx1ZGUvcHJvZmlsZXMvY3VzdG9tLXByb2ZpbGUvY2hlY2sKCiMgQ29weSB2Uk8gcmVzb3VyY2UgbWV0cmljcyBmaWxlIHRvIHlvdXIgY3VzdG9tIHByb2ZpbGUKY2F0IDw8RU9GID4gL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLzkwLXJlc291cmNlcy55YW1sCnBvbHlnbG90UnVubmVyTWVtb3J5TGltaXQ6IDYwMDBNCnBvbHlnbG90UnVubmVyTWVtb3J5UmVxdWVzdDogMTAwME0KcG9seWdsb3RSdW5uZXJNZW1vcnlMaW1pdFZjbzogNTYwME0KCnNlcnZlck1lbW9yeUxpbWl0OiA2RwpzZXJ2ZXJNZW1vcnlSZXF1ZXN0OiA1RwpzZXJ2ZXJKdm1IZWFwTWF4OiA0RwoKY29udHJvbENlbnRlck1lbW9yeUxpbWl0OiAxLjVHCmNvbnRyb2xDZW50ZXJNZW1vcnlSZXF1ZXN0OiA3MDBtCkVPRgpjaG1vZCA2NDQgL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLzkwLXJlc291cmNlcy55YW1sCg== | bash'
  3. Edit the resource metrics file in your custom profile with the desired memory values.
     
    vi /etc/vmware-prelude/profiles/custom-profile/helm/prelude_vco/90-resources.yaml
  4. The 90-resources.yaml file should contain the following default properties:
     
    polyglotRunnerMemoryRequest: 1000M
    polyglotRunnerMemoryLimit: 6000M
    polyglotRunnerMemoryLimitVco: 5600M
     
    serverMemoryLimit: 6G
    serverMemoryRequest: 5G
    serverJvmHeapMax: 4G
     
    controlCenterMemoryLimit: 1.5G
    controlCenterMemoryRequest: 700m

     

     
  5. Modify the 90-resources.yaml with the following properties (if cluster, all 3 nodes):
     
    polyglotRunnerMemoryRequest: 1000M
    polyglotRunnerMemoryLimit: 7000M
    polyglotRunnerMemoryLimitVco: 6700M
     
    serverMemoryLimit: 9G
    serverMemoryRequest: 8G
    serverJvmHeapMax: 7G
     
    controlCenterMemoryLimit: 1.5G
    controlCenterMemoryRequest: 700m
     
     
  6. Save the changes to the resource metrics file and run the deploy.sh script.
     
    /opt/scripts/deploy.sh
     
 



Additional Information

Impact/Risks:
VMware Aria Automation or Automation Orchestrator fails to properly boot. Workflows will fail to run until this is resolved.