This article provides a list of steps to validate the health for the following appliances:
If you are deploying new instances of the previously mentioned products, follow the article Troubleshooting VMware Aria Automation cloud proxies and On-Premises appliance deployments.
kubectl get nodes
kubectl get pod -n prelude
vracli service status
vracli status first-boot
/opt/health/run.sh
kubectl get nodes reports one of the nodes as Not Ready, validate the disk space, power status, and Troubleshooting Kubernetes disk pressure or disk latency in VMware Aria Automation and Automation Orchestrator 8.x.kubectl get pod -n prelude reports several pods as not running or unhealthy, follow the guidance below:
/var/log/deploy.log may provide more information./opt/health/run.sh, the most common issues are:
vracli disk-mgr
df -hi
vracli cluster exec -- bash -c 'current_node; vracli disk-mgr; exit 0'
find ID1 -size +100M -exec du -h {} \; | less
Where ID1 is the name of the partition.
For Example:
find / -size +100M -exec du -h {} \; | less
find /data-size +100M -exec du -h {} \; | less
/home/root/log-bundle-YYYYMMDDTHHMMSS.tar./data partition, they may be caused due to abnormal table growth.
vracli dev psql
Type: yes
SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;
Once you have identified the largest tables, isolate the tables in that database consuming the largest amount of space by running the following command replacing ID2 with the database name:
\c ID2
SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;
/data/db/backup directory./data/db/backup directory grows and becomes heavily utilized. This is normal behavior as this directory is used as a snapshot in time for a backup of the DB prior to an upgrade or a patch./data/db/backup directory can be safely deleted.hprof files. These are Java heap dump files that can also be deleted. To check for these files, run the following command:
find / -iname "*hprof" 2>/dev/null
hprof files are safe to delete.Articles related to database growth issues:
nslookup $( iface-ip eth0)
nslookup $( uname -n)
Run the following commands on each appliance in the cluster:
/usr/bin/dig +noall +answer +nocookie -x $( iface-ip eth0 )
/usr/bin/dig +noall +answer +noedns -x $( iface-ip eth0 )
/usr/bin/dig +noall +answer -x $( iface-ip eth0 )
A helpful health validation set of tools may be found in the Automation Orchestrator Control Center.
https://AutomationOrchestratorFQDN/vco-controlcenter./opt/scripts/deploy.sh.vracli ntp show-config
vracli ntp status
Validating the product license:
vracli license
vracli version
See Build numbers and versions for VMware Aria Automation (formerly VMware vRealize Automation) for additional information.