Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

This article provides a list of steps to validate the health for the following appliances:

VMware Aria Automation (formerly VMware vRealize Automation) 8.x
Aria Automation Orchestrator (formerly vRealize Orchestrator) 8.x

If you are deploying new instances of the previously mentioned products, follow the article Troubleshooting VMware Aria Automation cloud proxies and On-Premises appliance deployments.

Environment

VMware Aria Automation 8.x
VMware Aria Automation Orchestrator 8.x

Resolution

Commands

Validating service and node health

kubectl get nodes
kubectl get pod -n prelude
vracli service status
vracli status first-boot
/opt/health/run.sh

Suggestions

If kubectl get nodes reports one of the nodes as Not Ready, validate the disk space, power status, and Troubleshooting Kubernetes disk pressure or disk latency in VMware Aria Automation and Automation Orchestrator 8.x.
If kubectl get pod -n prelude reports several pods as not running or unhealthy, follow the guidance below:
- The appliance was not restarted gracefully using VMware Aria Suite Lifecycle or following Starting and stopping VMware Aria Automation.
- The restart process failed. The log /var/log/deploy.log may provide more information.
- There are some errors specific to individual internal services. To analyze this information, validate the logs following the official product documentation under Log bundle structure and Displaying Logs.
When running the script /opt/health/run.sh, the most common issues are:
- eth0-ip: Validate in vCenter that the appliance(s) have a valid IP address.
- disk-usage: Validate the disk space as suggested in the appropriate below section.
- single-aptr: Validate DNS as suggested in the appropriate below section.

Validating disk space

For a single node, run the following command:
```
vracli disk-mgr
df -hi
```

In a three-node cluster, run the following command:

vracli cluster exec -- bash -c 'current_node; vracli disk-mgr; exit 0'

At least 20% of available disk space is required on all partitions. If any partition is over 20% find the files that are taking up the disk space using the command:

find ID1 -size +100M -exec du -h {} \; | less

Where ID1 is the name of the partition.

For Example:

find / -size +100M -exec du -h {} \; | less
find /data-size +100M -exec du -h {} \; | less

Old log bundles, database dumps, and Automation Orchestrator heap files may be safely removed if they are consuming any space. Log bundles are located under /home/root/log-bundle-YYYYMMDDTHHMMSS.tar.
Disk space may be expanded following Increase VMware Aria Automation appliance disk space.

If disk space issues persist for the /data partition, they may be caused due to abnormal table growth.

Run the following commands to isolate the largest databases in Postgres:

vracli dev psql

Type: yes

SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;

Once you have identified the largest tables, isolate the tables in that database consuming the largest amount of space by running the following command replacing ID2 with the database name:

\c ID2
SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;

Additional Info regarding disk usage specific to the /data/db/backup directory.
You may find that the /data/db/backup directory grows and becomes heavily utilized. This is normal behavior as this directory is used as a snapshot in time for a backup of the DB prior to an upgrade or a patch.
Everything within the /data/db/backup directory can be safely deleted.
There may be some additional space used up my hprof files. These are Java heap dump files that can also be deleted. To check for these files, run the following command:
```
find / -iname "*hprof" 2>/dev/null
```
hprof files are safe to delete.

Articles related to database growth issues:

Validating DNS

Run the following commands on each of the appliance(s) in the cluster:
```
nslookup $( iface-ip eth0)
nslookup $( uname -n)
```
- The DNS record must be Fully Qualified Domain Names (FQDNs), no shortname.
- A single A record and a single PTR record is required on initial installation. CNAMEs are supported for multi-tenancy. See Set up multi-organization tenancy for VMware Aria Automation.
  - Once you have installed Aria Automation 8.x, you may Change the VIP for VMware Aria Automation 8.x installations to a new CNAME.

Run the following commands on each appliance in the cluster:

/usr/bin/dig +noall +answer +nocookie -x $( iface-ip eth0 )
/usr/bin/dig +noall +answer +noedns -x $( iface-ip eth0 )
/usr/bin/dig +noall +answer -x $( iface-ip eth0 )

Scenarios

If the responses are blank, you must create a PTR record.
If only the last command succeeds and a Microsoft AD DNS server is being used, please review the Microsoft article titled Some DNS name queries are unsuccessful after you deploy a Windows-based DNS server.
If you must update a record in DNS, follow the instructions located under Update the DNS assignment for VMware Aria Automation. Do not make manual changes to the DNS configuration.

Load Balancer

Login to the configured load balancing technology and validate if there are any errors.
Validate the appliance(s) can communicate with the load balancer.

VMware Aria Automation Orchestrator

A helpful health validation set of tools may be found in the Automation Orchestrator Control Center.

Login using root credentials to Control Center: https://AutomationOrchestratorFQDN/vco-controlcenter.
Click on Validate Configuration.
Validate if there are any errors.
1. If the system is configured to use VMware vCenter as the Authentication Provider and the certificate has been recently replaced, go to the Configuration Authentication Provider and revalidate the integration.

VMware Identity Manager Health

VMware Aria Automation is fully dependent upon a functioning authentication source, VMware Identity Manager 3.x. If VMware Identity Manager is unhealthy, follow the article titled Validating Workspace One Access 3.3.x (formerly VMware Identity Manager) health.
- Once VMware Identity Manager is healthy, run the deploy scripts to start VMware Aria Automation: /opt/scripts/deploy.sh.

Additional Commands

Validating NTP:

vracli ntp show-config
vracli ntp status

Validating the product license:
```
vracli license
```
Validating the product version:
```
vracli version
```
See Build numbers and versions for VMware Aria Automation (formerly VMware vRealize Automation) for additional information.

Best Practices

Power ON and OFF VMware Aria Automation from VMware Aria Suite Lifecycle or following Starting and stopping VMware Aria Automation.
Custom modifications to the OS of the appliances is not supported. See VMware Virtual Appliances and customizations to operating system and included packages.
Keep the appliances upgraded to a supported version.
1. The supported versions may be validated at lifecycle - Support Portal - Broadcom support portal
2. The interoperability may be validated at Interoperability matrix.
If the root password was reset, update the password in VMware Aria Suite Lifecycle Locker. In VCF-enabled mode, the root password is managed by VMware SDDC Manager.
If VMware Identity Manager is unhealthy, VMware Aria Automation identity-service pod will fail to start.
The UI will not load if you attempt to access the application using the FQDN of an individual node. Only the load balancer FQDN may be used to access the application from your preferred web browser.

Validating VMware Aria Automation and Automation Orchestrator 8.x health

Article ID: 326114

Updated On: