Postgres stuck in CrashLoopBackOff causing TCA authentication to fail for users using Active Directory

search cancel

Postgres stuck in CrashLoopBackOff causing TCA authentication to fail for users using Active Directory

book

Article ID: 345725

calendar_today

Updated On: 07-24-2024

Products

VMware Telco Cloud Automation

Issue/Introduction

Postgres Database in some Telco Cloud Automation (TCA) instances is constantly crashing and fails to start

# To check check the status of Postgres pods, login to TCA using SSH and run the following commands on shell.
export KUBECONFIG=/home/admin/.kube/config
kubectl get pods -n {namespace}
# namespace is tca-mgr for TCA Manager and tca-system for TCA Control Plane.
 
----------------------------------------------
For example:
[admin@tca-mgr~]$ kubectl get pods -n tca-mgr
 NAME                         READY   STATUS             RESTARTS        AGE
 postgresql-ha-postgresql-0   0/1     CrashLoopBackOff   12 (2m9s ago)   24d # postgres pod is in crashing state

This will cause TCA authentication to fail on TCA Manager if customer is using Active Directory based authentication. TCA UI will throw a error saying "Invalid user name or password"

# error thrown by TCA when trying to login.
/hybridity/api/sessions
{
  "timestamp" : "2023-05-01T20:19:25.210+00:00",
  "status" : 401,
  "error" : "Unauthorized",
  "message" : "com.vmware.tca.keycloak.KeycloakRestClientException: [GET_ACCESS_TOKEN_FAILED]: Failed to login user. realm: orchestrationAD, username: *****, password: *****. Error: Connect to localhost:8180 [localhost/127.0.0.1] failed: Connection refused (Connection refused)",
  "path" : "/hybridity/api/sessions"
}

Environment

2.2.0

Cause

This happens if secrets of Postgres service have been replaced or if there are older docker images of Postgres present.

Resolution

Resolved in TCA 2.3

Workaround:

For existing TCA 2.2.0 , please follow the below instructions:

SSH to Telco Cloud Automation shell as admin and login to root user.
Download the forceRestartPostgres.sh and run it with following command

bash +x forceRestartPostgres.sh

Note: This script is only applicable to TCA 2.2.0 and it can take up to 15 mins depending on the environment as it brings down Postgres, deletes older Postgres images, deletes the persistent volume claim of Postgres and then force starts Postgres.

Verify that Postgres pods are running without any crashes using the following command.

# To check check the status of Postgres pods, login to TCA using SSH and run the following commands on shell.
 export KUBECONFIG=/home/admin/.kube/config
 kubectl get pods -n {namespace}
 # namespace is tca-mgr for TCA Manager and tca-system for TCA Control Plane.
 
 ----------------------------------------------
 For example:
 [admin@tca-mgr~]$ kubectl get pods -n tca-system
NAME                                        READY   STATUS      RESTARTS        AGE
postgresql-ha-postgresql-0                  1/1     Running     0               4d16h  # postgres pod is in running state

Attachments

forceRestartPostgres get_app

Feedback

thumb_up Yes

thumb_down No