Validating VMware Aria Automation and Automation Orchestrator 8.x health
search cancel

Validating VMware Aria Automation and Automation Orchestrator 8.x health

book

Article ID: 326114

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This article provides a list of steps to validate the health for the following appliances:

  • VMware Aria Automation (formerly VMware vRealize Automation) 8.x
  • Aria Automation Orchestrator (formerly vRealize Orchestrator) 8.x

If you are deploying new instances of the previously mentioned products, follow the article Troubleshooting VMware Aria Automation cloud proxies and On-Premises appliance deployments.

Environment

VMware Aria Automation 8.x
VMware Aria Automation Orchestrator 8.x

Resolution

Commands

Validating service and node health

kubectl get nodes
kubectl get pod -n prelude
vracli service status
vracli status first-boot
/opt/health/run.sh

Suggestions

  • If kubectl get nodes reports one of the nodes as Not Ready, validate the disk space, power status, and Troubleshooting Kubernetes disk pressure or disk latency in VMware Aria Automation and Automation Orchestrator 8.x.
  • If kubectl get pod -n prelude reports several pods as not running or unhealthy, follow the guidance below:
    • The appliance was not restarted gracefully using VMware Aria Suite Lifecycle or following Starting and stopping VMware Aria Automation.
    • The restart process failed. The log /var/log/deploy.log may provide more information.
    • There are some errors specific to individual internal services. To analyze this information, validate the logs following the official product documentation under Log bundle structure and Displaying Logs.
  • When running the script /opt/health/run.sh, the most common issues are:
    • eth0-ip: Validate in vCenter that the appliance(s) have a valid IP address.
    • disk-usage: Validate the disk space as suggested in the appropriate below section.
    • single-aptr: Validate DNS as suggested in the appropriate below section.

Validating disk space

  • For a single node, run the following command:
    vracli disk-mgr
    df -hi
  • In a three-node cluster, run the following command:
    vracli cluster exec -- bash -c 'current_node; vracli disk-mgr; exit 0'
  • At least 20% of available disk space is required on all partitions. If any partition is over 20% find the files that are taking up the disk space using the command:
find ID1 -size +100M -exec du -h {} \; | less

Where ID1 is the name of the partition.

For Example:

find / -size +100M -exec du -h {} \; | less
find /data-size +100M -exec du -h {} \; | less
  • Old log bundles, database dumps, and Automation Orchestrator heap files may be safely removed if they are consuming any space. Log bundles are located under /home/root/log-bundle-YYYYMMDDTHHMMSS.tar.
  • Disk space may be expanded following Increase VMware Aria Automation appliance disk space.
  • If disk space issues persist for the /data partition, they may be caused due to abnormal table growth.
    1. Run the following commands to isolate the largest databases in Postgres:
      vracli dev psql
      
      Type: yes
      
      SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;
    2. Once you have identified the largest tables, isolate the tables in that database consuming the largest amount of space by running the following command replacing ID2 with the database name:

      \c ID2
      SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;
      
  • Additional Info regarding disk usage specific to the /data/db/backup directory. 
  • You may find that the /data/db/backup directory grows and becomes heavily utilized. This is normal behavior as this directory is used as a snapshot in time for a backup of the DB prior to an upgrade or a patch.  
  • Everything within the /data/db/backup directory can be safely deleted. 

 

  • There may be some additional space used up my hprof files. These are Java heap dump files that can also be deleted. To check for these files, run the following command:
    find / -iname "*hprof" 2>/dev/null hprof files are safe to delete. 

Articles related to database growth issues:

Validating DNS

  1. Run the following commands on each of the appliance(s) in the cluster:
    nslookup $( iface-ip eth0)
    nslookup $( uname -n)
  2. Run the following commands on each appliance in the cluster:

    /usr/bin/dig +noall +answer +nocookie -x $( iface-ip eth0 )
    /usr/bin/dig +noall +answer +noedns -x $( iface-ip eth0 )
    /usr/bin/dig +noall +answer -x $( iface-ip eth0 )
    

Scenarios

Load Balancer

  • Login to the configured load balancing technology and validate if there are any errors.
  • Validate the appliance(s) can communicate with the load balancer.

VMware Aria Automation Orchestrator

A helpful health validation set of tools may be found in the Automation Orchestrator Control Center.

  1. Login using root credentials to Control Center: https://AutomationOrchestratorFQDN/vco-controlcenter.
  2. Click on Validate Configuration.
  3. Validate if there are any errors.
    1. If the system is configured to use VMware vCenter as the Authentication Provider and the certificate has been recently replaced, go to the Configuration Authentication Provider and revalidate the integration.

VMware Identity Manager Health

Additional Commands

Best Practices

  1. Power ON and OFF VMware Aria Automation from VMware Aria Suite Lifecycle or following Starting and stopping VMware Aria Automation.
  2. Custom modifications to the OS of the appliances is not supported. See VMware Virtual Appliances and customizations to operating system and included packages.
  3. Keep the appliances upgraded to a supported version.
    1. The supported versions may be validated at lifecycle - Support Portal - Broadcom support portal
    2. The interoperability may be validated at https://interopmatrix.vmware.com/Interoperability.
  4. If the root password was reset, update the password in VMware Aria Suite Lifecycle Locker. In VCF-enabled mode, the root password is managed by VMware SDDC Manager.
  5. If VMware Identity Manager is unhealthy, VMware Aria Automation identity-service pod will fail to start.
  6. The UI will not load if you attempt to access the application using the FQDN of an individual node. Only the load balancer FQDN may be used to access the application from your preferred web browser.