Analyzing root causes of Postgres Cluster issues on Workspace One (vIDM VMware identity Manager)
search cancel

Analyzing root causes of Postgres Cluster issues on Workspace One (vIDM VMware identity Manager)

book

Article ID: 322709

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This article contains the steps suggested to determine the root causes of Postgres Cluster issues. These issues are caused by network events and/or ungraceful reboots.

The article Troubleshooting VMware Identity Manager postgres cluster deployed through vRealize Suite Lifecycle Manager provides the steps suggested to recover Postgres Cluster issue. In order to reboot Workspace One Access (VMware Identity Manager) cluster it is suggested to use the Power OFF and ON from vRealize Suite Lifecycle Manager, otherwise, the article Graceful Shutdown and Power On of a VMware Identity Manager PostgreSQL cluster can be followed.

As a best practice, validate that VMware Aria Suite Lifecycle (formerly known as vRealize Suite Lifecycle Manager vRSLCM) has updated the root and sshuser passwords of the Workspace One Access nodes in Locker, VMware Aria Suite Lifecycle has Scheduled health checks and it is able to recover automatically some replication delays in the Postgres Cluster, this will help to improve the availability of the Aria Suite products.

Symptoms:

vIDM Postgres Health is unhealthy as indicated in the article Troubleshooting VMware Identity Manager postgres cluster deployed through vRealize Suite Lifecycle Manager

Environment

VMware Identity Manager 3.3.x

Resolution

Workaround:

Prerequisites

Collect the following information:

  1. Run the commands on each of the Workspace Access One (vIDM) nodes.
journalctl -u systemd-networkd
journalctl --list-boots
  1. Collect the log /var/log/pgService/pgService.log  in the primary node of the Postgres Cluster, to identify the primary you can follow the article 367175.
/var/log/pgService/pgService.log
  1. Collect a log bundle from vRSLCM or current and old versions of Engine logs /var/log/vrlcm/vmware_vrlcm.log, this log includes historical data of the status of the PgCluster, if the vIDM root/sshuser passwords are updated in Locker the information will not be available.

Analysis

  1. The vRSLCM Engine logs /var/log/vrlcm/vmware_vrlcm-##.log will provide insights into when events occur in the nodes. You can filter the historical data of the LCM logs using  grep -A 5 -r "replication_delay". In the output of this grep command, you can validate historically:

  • Status (up or down)
  • If there was a replication delay in the nodes.
  • In the last column last_status_change, where you can validate when events occurred in the cluster. This column will report when a node changed from up to down. 
node_id | hostname | port | status | lb_weight | role    | select_cnt | load_balance_node | replication_delay | last_status_change
---------+---------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | Host1   | 5432 |   up   | 0.333333  | master  |     0      |      false        |        0          | 2019-10-14 06:05:42
 1       | Host2   | 5432 |   up   | 0.333333  | standby |     0      |      false        |        0          | 2019-10-14 06:05:42
 2       | Host3   | 5432 |   up   | 0.333333  | standby |     0      |      true         |        0          | 2019-10-14 06:05:42
(3 rows)
  1. Validate if there are network events in each of the nodes with the output journalctl -u systemd-networkd around the timestamps found in step 1 or when the Postgres Cluster was down. 

The first line of the output of this command will indicate when the logs begin. Here is an example of the first line of the logs with a sample network event.

journalctl -u systemd-networkd
-- Logs begin at Fri 2022-03-04 05:35:39 UTC, end at Wed 2022-04-20 09:13:31 UTC. --
...
...
Apr 07 17:06:29 vidmnode1.lab.com systemd-networkd[1374]: eth0: Lost carrier
Apr 07 17:06:29 vidmnode1.lab.com systemd-networkd[1374]: eth0: Gained carrier
Apr 07 17:06:29 vidmnode1.lab.com systemd-networkd[1374]: eth0: Configured
  1. Search for errors in the log /var/log/pgService/pgService.log. Here are some examples of network events

    2022-05-27T05:57:32.821495+00:00 vidmnode1 pgpool[2488]: [4980-1] 
    2022-05-27 05:57:32: pid 2488: WARNING: network IP is removed and system has no IP is assigned 
    2022-05-27T05:57:32.821608+00:00 vidmnode1 pgpool[2488]: [4980-2] 
    2022-05-27 05:57:32: pid 2488: DETAIL: changing the state to in network trouble 
    2022-05-27T05:57:32.821646+00:00 vidmnode1 pgpool[2488]: [4981-1] 
    2022-05-27 05:57:32: pid 2488: LOG: watchdog node state changed from [MASTER] to [IN NETWORK TROUBLE] 
    2022-05-27T05:57:32.822902+00:00 vidmnode1 pgpool[2488]: [4982-1] 
    2022-05-27 05:57:32: pid 2488: FATAL: system has lost the network 
    2022-05-27T05:57:32.822951+00:00 vidmnode1 pgpool[2488]: [4983-1] 
    2022-05-27 05:57:32: pid 2488: LOG: Watchdog is shutting down 
    2022-05-27T05:57:32.823026+00:00 vidmnode1 pgpool[14653]: [4981-1] 
    2022-05-27 05:57:32: pid 14653: LOG: watchdog: de-escalation started
    
  2. Validate the timestamps of the reboot of the nodes from the output of journalctl --list-boots . All the nodes should be rebooted on similar times if a graceful reboot was followed.

VMware Aria Suite Lifecycle (vRSLCM) notifications.

VMware Aria Suite Lifecycle has changed the notifications trigger regarding cluster issues since 8.8.2:

  • vRealize Suite Lifecycle Manager 8.8.2 or later .

    • RED flag – Cluster unhealthy: Postgres is down and higher replication delays.

    • YELLOW flag - Cluster is unhealthy: there are auto-recoverable replication delays.

    • GREEN flag - Cluster is healthy: all the nodes are up, there are no replication delay.

  • Older versions than vRealize Suite Lifecycle Manager 8.8.2

    • RED flag – Cluster unhealthy: This can imply: There are auto-recoverable replication delays, one of the nodes is down, postgres is down or higher replication delays (no auto-recoverable replication delays).

    • GREEN flag - Cluster is healthy: all the nodes are up, there are no replication delay.

Additional Information

Understanding the Scheduled health checks. 

VMware Aria Suite Lifecycle is constantly monitoring the vIDM health as explained on Scheduled health checks, on this section is available a description of the tasks executed during the Scheduled health checks. 

Find attached a full example of the Scheduled vIDM health check.

  1. The first part of the health check VMware Aria Suite Lifecycle validates can connect to vIDM using the root account and run the ls command in each of the vIDM nodes.
2022-10-26 21:14:31.660 INFO  [pool-3-thread-42] c.v.v.l.v.c.u.VidmPgpoolUtil -  -- Performing ping related checks as parameter set to true.
2022-10-26 21:14:31.665 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.102
2022-10-26 21:14:31.667 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: ls
2022-10-26 21:14:31.713 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command --> ls
2022-10-26 21:14:32.762 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- exit-status: 0
2022-10-26 21:14:32.762 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
  1. VMware Aria Suite Lifecycle validates which vIDM node has assigned the delegated IP.
2022-10-26 21:14:34.961 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.102
2022-10-26 21:14:34.961 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: ifconfig eth0:0 | grep 'inet addr:' | cut -d: -f2
2022-10-26 21:14:35.010 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command --> ifconfig eth0:0 | grep 'inet addr:' | cut -d: -f2
2022-10-26 21:15:05.058 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- exit-status: 0
2022-10-26 21:15:05.058 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
2022-10-26 21:15:05.058 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Command Status code :: 0
…
2022-10-26 21:15:05.059 INFO  [pool-3-thread-42] c.v.v.l.v.c.u.VidmPgpoolUtil -  -- delegateIP is assigned to 192.168.20.102
2022-10-26	1:15:05.059 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.102
  1. VMware Aria Suite Lifecycle validates the status of the pgService in each of the nodes.
2022-10-26 21:15:05.059 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: /etc/init.d/pgService status
2022-10-26 21:15:05.110 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command --> /etc/init.d/pgService status
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- exit-status: 0
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Command Status code :: 0
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Output Stream ::
2022-10-26 21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26	21:15:06.158 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Getting pgpool service status pgpool service is running
  1. VMware Aria Suite Lifecycle obtains the pgpool password.
2022-10-26 21:15:08.359 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: cat /usr/local/etc/pgpool.pwd
2022-10-26 21:15:08.408 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command --> cat /usr/local/etc/pgpool.pwd
  1. VMware Aria Suite Lifecycle validates which node is the Master of the cluster.
2022-10-26 21:15:38.464 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.102
2022-10-26 21:15:38.464 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: su root -c "echo -e MXMXMXMX|/usr/local/bin/pcp_watchdog_info -p 9898 -h localhost -U pgpool"
2022-10-26 21:15:38.516 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command --> su root -c "echo -e MXMXMXMX|/usr/local/bin/pcp_watchdog_info -p 9898 -h localhost -U pgpool"
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- exit-status: 0
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Command Status code :: 0
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Output Stream ::
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26 21:15:39.562 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- 3 NO vidm83node1.autolab.local:9999 Linux vidm83node1.autolab.local 192.168.20.101

vidm83node2.autolab.local:9999 Linux vidm83node2.autolab.local vidm83node2.autolab.local 9999 9000 7 STANDBY
vidm83node1.autolab.local:9999 Linux vidm83node1.autolab.local 192.168.20.101 9999 9000 4 MASTER
vidm83node3.autolab.local:9999 Linux vidm83node3.autolab.local 192.168.20.103 9999 9000 7 STANDBY
  1. VMware Aria Suite Lifecycle validates if the nodes are up, if there is a replication delay, and which is the primary node.
2022-10-26 21:16:09.662 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND ::  su postgres -c "echo -e MXMXMXMX|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""
2022-10-26 21:16:09.713 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Executing command -->  su postgres -c "echo -e MXMXMXMX|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""
2022-10-26 21:16:10.757 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- exit-status: 0
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Command Status code :: 0
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- Output Stream ::
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  -- ====================================================
2022-10-26 21:16:10.758 INFO  [pool-3-thread-42] c.v.v.l.v.d.h.VidmUtil -  --  node_id |    hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | 192.168.20.101 | 5432 | up     | 0.333333  | standby | 0          | false             | 0                 | 2022-10-15 00:31:01
 1       | 192.168.20.102 | 5432 | up     | 0.333333  | primary | 0          | false             | 0                 | 2022-10-15 00:14:16
 2       | 192.168.20.103 | 5432 | up     | 0.333333  | standby | 0          | true              | 0                 | 2022-10-15 00:33:05
(3 rows)
  1. VMware Aria Suite Lifecycle confirms the vIDM health status.
2022-10-26 21:17:44.418 INFO  [pool-3-thread-42] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- vIDM Cluster health status is GREEN !!!
2022-10-26 21:17:44.418 INFO  [pool-3-thread-42] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- Skipping update of existing cluster health notification status level
2022-10-26 21:17:44.418 INFO  [pool-3-thread-42] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- Updating already existing notification with key globalenvironment,vidm,vidmClusterHealthNotification
2022-10-26 21:17:44.427 INFO  [pool-3-thread-42] c.v.v.l.p.a.s.Task -  -- Injecting Edge :: OnVidmClusterHealthNotify
2022-10-26 21:17:44.428 INFO  [pool-3-thread-42] c.v.v.l.p.a.s.Task -  -- ========================================

VMware Aria Suite Lifecycle periodically runs this task in order to health auto-recoverable replication delays caused by network glitch.

In the attachments section is the full output of this task as a reference. 

If VMware Aria Suite Lifecycle is not able to SSH to vIDM the tasks will fail and it will report the cluster as unhealthy. 


Here is an example of the log output when SSH connection fails.

2023-03-05 03:59:50.156 INFO  [pool-3-thread-48] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: ls
2023-03-05 03:59:53.869 ERROR [pool-3-thread-48] c.v.v.l.u.SessionHolder -  -- SessionHolder.newSession Exception encountered
com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:519) ~[jsch-0.1.54.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.newSession(SessionHolder.java:53) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.<init>(SessionHolder.java:37) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.executeWithKnownTimeout(SshUtils.java:620) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.runCommandWithKnownTimeout(SshUtils.java:477) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:185) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.j                                                                                                                                                                ar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vi                                                                                                                                                                dmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 03:59:53.923 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- Exception cause : com.jcraft.jsch.JSchException: Auth fail
2023-03-05 03:59:53.923 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- JSchException encountered
2023-03-05 03:59:53.929 ERROR [pool-3-thread-48] c.v.v.l.v.c.u.VidmPgpoolUtil -  -- Exception while validating SSH root credentials of the vIDM host - 192.16                                                                                                                                                                8.20.102
com.vmware.vrealize.lcm.util.exception.SshAuthenticationFailureException: Cannot execute ssh commands on the host - 192.168.20.102, validate the SSH login cr                                                                                                                                                                edentials.
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:193) ~[vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.j                                                                                                                                                                ar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vi                                                                                                                                                                dmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 03:59:53.931 INFO  [pool-3-thread-48] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.103
2023-03-05 03:59:53.931 INFO  [pool-3-thread-48] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: ls
2023-03-05 04:00:02.639 ERROR [pool-3-thread-48] c.v.v.l.u.SessionHolder -  -- SessionHolder.newSession Exception encountered
com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:519) ~[jsch-0.1.54.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.newSession(SessionHolder.java:53) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.<init>(SessionHolder.java:37) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.executeWithKnownTimeout(SshUtils.java:620) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.runCommandWithKnownTimeout(SshUtils.java:477) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:185) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vidmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 04:00:02.641 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- Exception cause : com.jcraft.jsch.JSchException: Auth fail
2023-03-05 04:00:02.641 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- JSchException encountered
2023-03-05 04:00:02.642 ERROR [pool-3-thread-48] c.v.v.l.v.c.u.VidmPgpoolUtil -  -- Exception while validating SSH root credentials of the vIDM host - 192.168.20.103
com.vmware.vrealize.lcm.util.exception.SshAuthenticationFailureException: Cannot execute ssh commands on the host - 192.168.20.103, validate the SSH login credentials.
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:193) ~[vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vidmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 04:00:02.643 INFO  [pool-3-thread-48] c.v.v.l.v.d.h.VidmUtil -  -- vIDM ENDPOINT HOST :: 192.168.20.101
2023-03-05 04:00:02.643 INFO  [pool-3-thread-48] c.v.v.l.v.d.h.VidmUtil -  -- COMMAND :: ls
2023-03-05 04:00:05.857 ERROR [pool-3-thread-48] c.v.v.l.u.SessionHolder -  -- SessionHolder.newSession Exception encountered
com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:519) ~[jsch-0.1.54.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.newSession(SessionHolder.java:53) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SessionHolder.<init>(SessionHolder.java:37) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.executeWithKnownTimeout(SshUtils.java:620) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.util.SshUtils.runCommandWithKnownTimeout(SshUtils.java:477) [lcm-util-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:185) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vidmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 04:00:05.864 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- Exception cause : com.jcraft.jsch.JSchException: Auth fail
2023-03-05 04:00:05.864 ERROR [pool-3-thread-48] c.v.v.l.u.SshUtils -  -- JSchException encountered
2023-03-05 04:00:05.867 ERROR [pool-3-thread-48] c.v.v.l.v.c.u.VidmPgpoolUtil -  -- Exception while validating SSH root credentials of the vIDM host - 192.168.20.101
com.vmware.vrealize.lcm.util.exception.SshAuthenticationFailureException: Cannot execute ssh commands on the host - 192.168.20.101, validate the SSH login credentials.
        at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmUtil.runVidmSshCommand(VidmUtil.java:193) ~[vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.clustering.util.VidmPgpoolUtil.checkClusterStatus(VidmPgpoolUtil.java:829) [vmlcm-vidmplugin-driver-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.vidm.core.task.notification.VidmClusterHealthNotificationTask.execute(VidmClusterHealthNotificationTask.java:91) [vmlcm-vidmplugin-core-8.10.0-SNAPSHOT.jar!/:?]
        at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:63) [vmlcm-engineservice-core-8.10.0-SNAPSHOT.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-03-05 04:00:05.875 INFO  [pool-3-thread-48] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- Final Status is not empty : ---->
Unable to create SSH connection to the VMware Identity Manager host(s): [192.168.20.102, 192.168.20.103, 192.168.20.101] as root user from vRSLCM. Ensure vRSLCM inventory has the right root passwords YXYXYXYX Identity Manager nodes and not expired. Trigger inventory sync of globalenvironment in vRSLCM to update the current root password YXYXYXYX
2023-03-05 04:00:05.876 INFO  [pool-3-thread-48] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- vIDM Cluster health status is RED !!!
2023-03-05 04:00:05.877 INFO  [pool-3-thread-48] c.v.v.l.v.c.t.u.VidmInstallTaskUtil -  -- vIDM Configuration property vidmClusterHealthKBLink value is obtained from Config Service : https://kb.vmware.com/s/article/75080
2023-03-05 04:00:05.878 INFO  [pool-3-thread-48] c.v.v.l.v.c.t.n.VidmClusterHealthNotificationTask -  -- Updating already existing notification with key globalenvironment,vidm,vidmClusterHealthNotification
2023-03-05 04:00:05.885 INFO  [pool-3-thread-48] c.v.v.l.p.a.s.Task -  -- Injecting Edge :: OnVidmClusterHealthNotify
2023-03-05 04:00:05.886 INFO  [pool-3-thread-48] c.v.v.l.p.a.s.Task -  -- ========================================

Attachments

Scheduled vIDM Health check - full output get_app