Identity Manager pgool service is down on one node / replication delay - Login to connected Applications fail
search cancel

Identity Manager pgool service is down on one node / replication delay - Login to connected Applications fail

book

Article ID: 409939

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Aria Suite Lifecycle (ASL) reports health issues with VMware Identity Manager (vIDM), specifically indicating that the pgpool service is down on one or more nodes and/or that there is a significant replication delay within the vIDM PostgreSQL cluster. This signifies an unhealthy state of the underlying database infrastructure, impacting vIDM's stability and functionality.
  • Login attempts to connected Applications such as Aria Automation, Aria Operations or NSX fail with 502 Bad Gateway error as the VMware Identity Manager application is not reachable via Load Balancer

Environment

VMware Identity Manager (vIDM) deployed as a cluster.

Aria Suite Lifecycle (ASL) managing the vIDM environment.

Cause

The observed symptoms (down pgpool service, replication delay) are direct indicators that the vIDM PostgreSQL cluster is in an unhealthy state. An unhealthy PostgreSQL cluster prevents proper data synchronization between database nodes. This can occur due to various reasons, including network interruptions, service crashes, or database corruption, leading to a breakdown in inter-node communication and data consistency. Aria Suite Lifecycle's health monitoring detects these critical service disruptions and replication anomalies, reporting the vIDM environment as unhealthy.

Resolution

To resolve the unhealthy state of the vIDM PostgreSQL cluster and restore proper service operation and replication, leverage the built-in remediation capabilities within Aria Suite Lifecycle. ASL is designed to automatically detect and correct common issues within managed environments, including database cluster inconsistencies.

Steps:

  1. Log in to Aria Suite Lifecycle: Access the ASL user interface with appropriate administrative credentials.
  2. Navigate to the vIDM Environment.
  3. Initiate Remediation: Select the global environment that is reporting health issues. Look for and select the "Remediate" option.
  4. Monitor Progress: ASL will initiate a process to diagnose and fix the underlying issues within the vIDM cluster, which may include restarting services, re-establishing database connections, and resynchronizing replication. Monitor the task progress within the ASL UI.

If the remediate option doesn't complete or errors out, please follow steps in this KB to resolve cluster issues:
Troubleshooting VMware Identity Manager postgres cluster deployed through Aria Suite Lifecycle (vRSLCM)