See the Workaround below for additional information.
Workaround:
Troubleshooting Steps:
- Validate the issue does not match to existing known issues:
- Validate the health of the environment:
- Troubleshoot Common Issues:
Procedure: Reinitialize each vco-app pod using Kubernetes Delete command
- SSH into the Automation Orchestrator 8.x appliance
- Verify pods using kubectl get pods -n prelude
- Run kubectl delete pod -n prelude vco-app<UUID> for each pod instance
- Wait for pod to rebuild
Procedure: Reinitialize each vco-app pod using Kubernetes SCALE/UP commands
- SSH into the Automation Orchestrator 8.x appliance
- Verify how many vco app instances are running from kubectl get pods -n prelude
-
Run commands to scale down replicas to Zero :
kubectl scale deployment orchestration-ui-app --replicas=0 -n prelude
kubectl scale deployment vco-app --replicas=0 -n prelude sleep 120
-
Run commands to scale up Replicas based on Single deployment (1) or Clustered Deployment (3):
kubectl scale deployments orchestration-ui-app --replicas=1 -n prelude
kubectl scale deployment vco-app --replicas=1 -n prelude
Procedure: Increasing Kubernetes health probe timeouts
Note: Improvements to these values have been introduced in 8.12.x. Do not implement these instructions on versions higher than 8.12.x and above.
- SSH into the Automation Orchestrator 8.x appliance.
- Use vi or vim to edit /opt/charts/vco/templates/deployment.yaml on each node in the cluster
- Edit the section for the vco-server-app container liveness and readiness probes, an example is below:
livenessProbe:
failureThreshold: 3
httpGet:
path: /vco/api/health/liveness
port: 8280
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: vco-server-app
ports:
- containerPort: 8280
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /vco/api/health/readiness
port: 8280
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 30
- Attempt to restart services by running:
/opt/scripts/deploy.sh
Note: To decrease the load on these timeouts:
- Ensure all web client tabs connecting to Automation Orchestrator are closed.
- If running on a single node, scale to a 3-node cluster.
- Upgrade to the latest version as this behavior is improved upon.
Analyzing *.hprof files (Java heap dumps from Automation Orchestrator)
Symptoms
- *.hprof files fill up a large amount of disk space.
- After deleting the files filling up the disk, a new *.hprof file is generated and the process repeats before Automation Orchestrator services are able to be logged into by a user.
Cause
- Custom workflows and actions maybe consuming too much java heap for the application to keep up, causing memory to be written to disk in *.hprof format, crashing the Orchestrator service.
Workaround
- Contact your workflow Developer. The following instructions are considered a development task when writing workflows for Automation Orchestrator.
- Download VisualVM to a system external to the Automation Orchestrator appliance.
- Extract the zip file.
- Generate a heap dump by following instructions found in Error! 500 when attempting to generate a heap dump in Aria Orchestrator control center interface.
- Copy the *.hprof file to a location accessible by this external system.
- Start VisualVM. The executable is located in the ./bin directory.
- In the VisualVM explorer window, right-click on the Heap Dumps node and choose Load Heap Dump.
- Navigate to the location of your *.hprof file, select it, and click Open.
- Once the heap dump is loaded, it will appear as a node under Heap Dumps. Click on it to analyze the heap dump.
- Triage the issue by isolating threads or workflows consuming a large amount of memory.
- Refactor your code to be considerate of Java heap.
- If the issue persists, try enabling Safe Mode by setting ch.dunes.safe-mode = true in Control Center under System Properties.
Note: In 8.18 and later, the Control Center has been removed and the property will need to be set using "vracli vro
" commands as per the documentation: Additional command line interface configuration options
- Monitor for the service to restart then try accessing Automation Orchestrator again.
NOTE: If the pods keep going into CrashLoopBAckOff and generating heap dumps, then it is likely that Orchestrator is automatically re-trying the failed workflow when it restarts. In this situation, you will need to cancel all executions:
- Run this command on one node:
vracli vro cancel executions
- Then remove any new hprof files and restart the pods if necessary.
- Scaling the heap memory of the vRealize Orchestrator Appliance is only applicable for standalone vRealize Orchestrator instances and is not supported for embedded vRealize Orchestrator instances in vRealize Automation.
- Increase the RAM of the virtual machine on which vRealize Orchestrator is deployed up to the next suitable increment. Because it is important that enough memory is left available for the rest of the services, the vRealize Orchestrator Appliance resources must be scaled up first. For example, If the desired heap memory is
7G
then the vRealize Orchestrator Appliance RAM should be increased with 4G
respectively because the subtraction between the default heap value of 3G
and the desired heap memory is 4G
.
- Log in the vRealize Orchestrator Appliance command line over SSH as root.
- To create the custom profile directory and the required directory tree that is used when the profile is active, run the following script:
vracli cluster exec -- bash -c 'base64 -d <<< IyBDcmVhdGUgY3VzdG9tIHByb2ZpbGUgZGlyZWN0b3J5Cm1rZGlyIC1wIC9ldGMvdm13YXJlLXByZWx1ZGUvcHJvZmlsZXMvY3VzdG9tLXByb2ZpbGUvCgojIENyZWF0ZSB0aGUgcmVxdWlyZWQgZGlyZWN0b3J5IHRyZWUgdGhhdCB3aWxsIGJlIHVzZWQgd2hlbiB0aGUgcHJvZmlsZSBpcyBhY3RpdmUKbWtkaXIgLXAgL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLwoKIyBDcmVhdGUgImNoZWNrIiBmaWxlIHRoYXQgaXMgYW4gZXhlY3V0YWJsZSBmaWxlIHJ1biBieSBkZXBsb3kgc2NyaXB0LgpjYXQgPDxFT0YgPiAvZXRjL3Ztd2FyZS1wcmVsdWRlL3Byb2ZpbGVzL2N1c3RvbS1wcm9maWxlL2NoZWNrCiMhL2Jpbi9iYXNoCmV4aXQgMApFT0YKY2htb2QgNzU1IC9ldGMvdm13YXJlLXByZWx1ZGUvcHJvZmlsZXMvY3VzdG9tLXByb2ZpbGUvY2hlY2sKCiMgQ29weSB2Uk8gcmVzb3VyY2UgbWV0cmljcyBmaWxlIHRvIHlvdXIgY3VzdG9tIHByb2ZpbGUKY2F0IDw8RU9GID4gL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLzkwLXJlc291cmNlcy55YW1sCnBvbHlnbG90UnVubmVyTWVtb3J5TGltaXQ6IDYwMDBNCnBvbHlnbG90UnVubmVyTWVtb3J5UmVxdWVzdDogMTAwME0KcG9seWdsb3RSdW5uZXJNZW1vcnlMaW1pdFZjbzogNTYwME0KCnNlcnZlck1lbW9yeUxpbWl0OiA2RwpzZXJ2ZXJNZW1vcnlSZXF1ZXN0OiA1RwpzZXJ2ZXJKdm1IZWFwTWF4OiA0RwoKY29udHJvbENlbnRlck1lbW9yeUxpbWl0OiAxLjVHCmNvbnRyb2xDZW50ZXJNZW1vcnlSZXF1ZXN0OiA3MDBtCkVPRgpjaG1vZCA2NDQgL2V0Yy92bXdhcmUtcHJlbHVkZS9wcm9maWxlcy9jdXN0b20tcHJvZmlsZS9oZWxtL3ByZWx1ZGVfdmNvLzkwLXJlc291cmNlcy55YW1sCg== | bash'
- Edit the resource metrics file in your custom profile with the desired memory values.
vi /etc/vmware-prelude/profiles/custom-profile/helm/prelude_vco/90-resources.yaml
- The 90-resources.yaml file should contain the following default properties:
polyglotRunnerMemoryRequest: 1000M
polyglotRunnerMemoryLimit: 6000M
polyglotRunnerMemoryLimitVco: 5600M
serverMemoryLimit: 6G
serverMemoryRequest: 5G
serverJvmHeapMax: 4G
controlCenterMemoryLimit: 1.5G
controlCenterMemoryRequest: 700m
- Modify the 90-resources.yaml with the following properties (if cluster, all 3 nodes):
polyglotRunnerMemoryRequest: 1000M
polyglotRunnerMemoryLimit: 7000M
polyglotRunnerMemoryLimitVco: 6700M
serverMemoryLimit: 9G
serverMemoryRequest: 8G
serverJvmHeapMax: 7G
controlCenterMemoryLimit: 1.5G
controlCenterMemoryRequest: 700m
- Save the changes to the resource metrics file and run the deploy.sh script.