VMware implements internal health checks against the Elasticsearch/Opensearch service to maintain vRealize Automation 7.x application reliability as embedded VMware Identity Manager instances heavily leverage Elasticsearch/Opensearch in its normal application operations.
Common troubleshooting steps to restore the health of an Elasticsearch/Opensearch, single or multi-node, embedded cluster instance(s) within the vRealize Automation 7.x appliance(s) are contained within this article.
Symptoms:
VMware Identity Manager 3.3.x
VMware vRealize Automation 7.x
Datacenter network and storage outages can persist UNASSIGNED shards in a cluster overtime during Elasticsearch/Opensearch shard assignment tasks on cluster recovery.
Elasticsearch/Opensearch a search and analytics engine, used for auditing, reports, and directory sync logs, is embedded within the VMware vRealize Automation / Identity Manager virtual appliance. To verify the health of Elasticsearch/Opensearch, you must use the curl tool. If curl is not installed on the windows machine, you can query from a Linux or Mac machine to curl http://<localhost>:9200/_cluster/health?pretty
Impact/Risks:
The shard is the unit at which Elasticsearch/Opensearch distributes data around the cluster. The speed at which Elasticsearch/Opensearch can move shards around when rebalancing data, e.g. following a failure, will depend on the size and number of shards as well as network and disk performance.
Removing CLUSTER_RECOVERED and other stale and old UNASSIGNED shards has limited to no impact on a running cluster once removed. If shards persist in UNASSIGNED for an extended period of time, unexpected application behavior may occur, to include a failure of the health status check for Elasticsearch/Opensearch.
Health Status:curl http://localhost:9200/_cluster/health?pretty=true
Green: everything is good, there are enough nodes in the cluster to ensure at least 2 full copies of the data spread across the cluster.
Yellow: functioning, but there are not enough nodes in the cluster to ensure HA (eg, a single node cluster will always be in the yellow state because it can never have 2 copies of the data).
*for single Node - Elasticsearch/Opensearch will be yellow for a single node by default as it doesn't have a cluster. for single node its expected and it should not be a problem, if facing no issue in functionality.
Red: broken, unable to query existing data or store new data, typically due to not enough nodes in the cluster to function or out of disk space.