Unable to login to DX UI
“Failed to login. Verify that user ID and password are correct”
Corruption of the Postgres
Possible reasons:
1) NFS service stop working
2) Out of disk space in NFS
Symptoms:
Use-case #1 - postgres pod is in "CrashLoopBackOff"
Postgres pod is not running and you cannot ssh the pod:
kubectl get pods | grep postgres
Use-case #2: some of the database are corrupted, in this example "dsp_db"
postgres pod is in "Running" status:
kubectl get pods | grep postgres
In the postgres log, there are messages indicating that dsp_db database is corrupted
cd <NFS>/axaservices/pg-data/userdata/pg_log
tail -f postgresql-<day>.log
NOTE: You can to list all the databases in postgres as below:
a) obtain the postgres pod name
kubectl get pods -ndxi | grep post
postgresql-77c878cc47-76hwm 1/1 Running 0 26s
b) login to pod
kubectl exec -it postgresql-77c878cc47-76hwm -ndxi bash
c) list databases:
psql -U postgres -d postgres
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------+----------+----------+------------+------------+-----------------------
aoplatform | aopuser | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/aopuser +
| | | | | aopuser=CTc/aopuser
apmpe | apmpe | UTF8 | en_US.utf8 | en_US.utf8 |
cpa | aopuser | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/aopuser +
| | | | | aopuser=CTc/aopuser
doi | aopuser | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/aopuser +
| | | | | aopuser=CTc/aopuser
dsp_db | aopuser | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/aopuser +
| | | | | aopuser=CTc/aopuser
dxi | dxi | UTF8 | en_US.utf8 | en_US.utf8 |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
(9 rows)
postgres=# \q
DX Operational Intelligence 1.3.x, 20.x +
DX Application Performance Management 11.x, 20.x+
Recommendation #1: Make sure NFS service is up and running
a) make sure nfs server is up and running:
systemctl status nfs-server
if not:
systemctl restart nfs
b) showmount should return list all nfs server and clients as defined in the /etc/exports, run below command from all openshift/k8s nodes:
showmount -e <NFS server>
if you get the below error then it means that there are some communication or network issues:
clnt_create: RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)
make sure your firewall is configured correctly in the NFS server:
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload
Recommendation #2: Restore the postgres database backup
IMPORTANT: one of the best practices is to regularly backup the postgres database in case of pogres db corruption: