DX APM - ng-acc-configserver-deployment pod in CrashLoopBackOff due to FATAL: password authentication failed for user "<user>"
search cancel

DX APM - ng-acc-configserver-deployment pod in CrashLoopBackOff due to FATAL: password authentication failed for user "<user>"

book

Article ID: 212835

calendar_today

Updated On:

Products

CA Application Performance Management (APM / Wily / Introscope)

Issue/Introduction

After a reinstall DX Platform everything seems to working as expected except ACC, pods remain in CrashLoopBackOff and Init status.

ng-acc-configserver-db-deployment-854d85f7cb-wxq6m   1/1     Running            1          36m
ng-acc-configserver-deployment-654b587dd9-xqmvf      0/1     CrashLoopBackOff   10         29m
ng-acc-repository-deployment-67d4c577db-hqvhn        0/2     Init:0/1           0          27m

We tried to recreate the acc-configserver-db as described in KB article but the problem persists.

DX APM - ng-acc-configserver pods in CrashLoopBackOff due to ACC postgres DB corruption
https://knowledge.broadcom.com/external/article?articleId=144154

The database is up and running fine, pod ng-acc-configserver-deployment log throws below error message:

org.postgresql.util.PSQLException: FATAL: password authentication failed for user "<user>"
        at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:525)
        at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:146)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197)

 

Environment

DX Application Performance Management 20.2

Cause

ACC postgres security setting affecting ACC pods communication

How to verify the condition?

1) Try to logon to acc database not using localhost but using the service url hostname “ng-acc-configserver-db-service”

kubectl exec -ti ng-acc-configserver-db-deployment-pod> sh -n<namespace>
psql -W -h ng-acc-configserver-db-service -U <username> <password>

2) Check ACC postgres log

cd <nfs-dir>/acc/cs/db-logs/

ls -ltr

tail -f postgresql-<timestamp>.log

Found message: User "user" does not have a valid SCRAM verifier.

 

3) Check postgres configuration:

kubecl exec -ti ng-acc-configserver-db-deployment-pod> sh -n<namespace>
psql -W -h ng-acc-configserver-db-service -U <username> <password>
cat /var/lib/postgresql/data/pg_hba.conf

 

 

Resolution

1) connect to posgres db pod

get pods -n<namespace> | grep ng-acc-configserver-db-deployment

kubecl exec -ti <ng-acc-configserver-db-deployment-pod> sh -ndxi

2) Check you can connect to postgres DB

psql -U  <username> <password>

to exit: \q

3) Edit pg_hba.conf

vi /var/lib/postgresql/data/pg_hba.conf

comment the scram-sha-256 line: 

#host all all all scram-sha-256

 

4) Restart ng-acc-configserver pods

kubectl scale --replicas=0 deployment/ng-acc-configserver-db-deployment -n<namespace>
kubectl scale --replicas=1 deployment/ng-acc-configserver-db-deployment -n<namespace>
kubectl scale --replicas=0 deployment/ng-acc-configserver-deployment -n<namespace>
kubectl scale --replicas=1 deployment/ng-acc-configserver-deployment -n<namespace>

 

5) Check apmccsrv log and postgres

a) postgres logs:

cd <nfs-dir>/acc/cs/db-logs/

ls -ltr  (to get latest logs at the end of the list)

tail -f postgresql-<timestamp>.log

 

b) apmccsrv log

cd <nfs-dir>/acc/cs/logs/

ls -ltr   

tail -f apmccsrv-<pod>.log

 

 

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815