AIOps - Unable to login due to PODs connectivity issues

Products

DX Operational Intelligence CA App Experience Analytics DX Application Performance Management

Issue/Introduction

Symptoms: "Failed to login. Verify that user ID and password are correct"

From developer tools we could see:

{"error":{"code":0,"message":"GENERIC_SERVICE_ERROR","traceId":"19e3aad71bcb70d7"}}

Environment

DX Platform 2x
DX AIOps 2x

Cause

Timing issues, Network delays, Hardware capacity, resources issues affecting pods connectivity.

Resolution

Option 1: Review network, hardware and software recommendations

AIOps - Performance Recommendations

Option 2: Restart all services

cd <DXPlatform-Installer-HOME>/tools

./dx-admin.sh stop

Check all pods have been terminated using: kubectl get pods -n<namespace>

./dx-admin.sh start

Check all pods are up and running using: kubectl get pods -n<namespace>

Option 3: Restart only the pods required for the login process

1) Scale down the following deployments:

kubectl scale --replicas=0 deployment doi-adminui -n<namespace>
kubectl scale --replicas=0 deployment doireadserver -n<namespace>
kubectl scale --replicas=0 deployment apmservices-manager-001 -n<namespace>
kubectl scale --replicas=0 deployment dxi-adminui -n<namespace>
kubectl scale --replicas=0 deployment dxi-readserver -n<namespace>
kubectl scale --replicas=0 deployment axaservices-readserver -n<namespace>
kubectl scale --replicas=0 deployment axaservices-amq -n<namespace>
kubectl scale --replicas=0 deployment dxi-postgresql -n<namespace>

2) Verify that all pods have been terminated

3) Scale up deployments in the following order 1 by 1.

IMPORTANT: Make sure all pods are start correctly by checking the pods logs using : kubectl logs -f <pod-name> -n <namespace>

as below:

kubectl scale --replicas=1 deployment dxi-postgresql -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment axaservices-amq -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment axaservices-readserver -n<namespace>

This process can take several minutes to startup, wait for below lines to appear in the pod log before starting the next pod:

NOTE: you can ignore the below ERROR messages:

[EMMCacheRefresher,mdo-serverCacheRefreshCheck] ERROR [] - TID[14015724] 5271596: JMS Request processing timeout, please wait for background processing
com.ca.emm.corejsvr.ExceptionWithNC: 5271596: JMS Request processing timeout, please wait for background processing

..

2021-11-30 14:03:09,521 [EMMCacheRefresher,mdo-serverCacheRefreshCheck] ERROR [] - TID[14015724] 3011599: Internal Error: Unable to complete this cache refresh cycle: 5271596: JMS Request processing timeout, please wait for background processing

kubectl scale --replicas=1 deployment dxi-readserver -n<namespace>

This process can take several minutes to startup, wait for below lines to appear in the pod log before starting the next pod.

In addition search for: Successfully connected to tcp://axaservices-amq:61616

kubectl scale --replicas=1 deployment dxi-adminui -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment apmservices-manager-001 -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment doireadserver -n<namespace>

This process can take several minutes to startup, wait for below lines to appear in the pod log before starting the next pod.

Search for the "Server startup" log entry

kubectl scale --replicas=1 deployment doi-adminui -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

3) Finally verify that all pods are up and running:

For example:

4) Login to DX UI

What to collect if the problem persists

Collect the following information and contact Broadcom Support.

- dx-platform install log
- kubectl describe nodes
- kubectl get events -n <your-dxi-namespace>
- kubectl get pods -n <your-dxi-namespace>
- free -h and - disk -h from each node, master and nfs server
- Collect the apmservices gateway, dxi-adminui, dxi-readserver, dxi-amq and apmservices-manager logs from NFS dxi location

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices