AIOps - Unable to login due to PODs connectivity issues

Products

DX Operational Intelligence CA App Experience Analytics DX Application Performance Management

Issue/Introduction

Symptoms:

From developer tools we could see:

{"error":{"code":0,"message":"GENERIC_SERVICE_ERROR","traceId":"19e3aad71bcb70d7"}}

Environment

DX Platform 2x

Cause

Network issues affecting pods connectivity:

As per DX Platform documentation "Hardware Requirements" section, ensure Network speed of 10 Gbps between all nodes

Resolution

Option 1: Restart all services

cd <DXPlatform-Installer-HOME>/tools

./dx-admin.sh stop

Check all pods have been terminated using: kubectl get pods -n<namespace>

./dx-admin.sh start

Check all pods are up and running using: kubectl get pods -n<namespace>

Option 2: Restart only the pods required for the login process

For 21x / 22.x

1) Scale down the following deployments:

kubectl scale --replicas=0 deployment doi-adminui -n<namespace>
kubectl scale --replicas=0 deployment doireadserver -n<namespace>
kubectl scale --replicas=0 deployment apmservices-manager-001 -n<namespace>
kubectl scale --replicas=0 deployment dxi-adminui -n<namespace>
kubectl scale --replicas=0 deployment dxi-readserver -n<namespace>
kubectl scale --replicas=0 deployment axaservices-readserver -n<namespace>
kubectl scale --replicas=0 deployment axaservices-amq -n<namespace>
kubectl scale --replicas=0 deployment dxi-postgresql -n<namespace>

2) Verify that all pods have been terminated

3) Scale up deployments in the following order 1 by 1.

IMPORTANT: Make sure all pods are start correctly by checking the pods logs using : kubectl logs -f <pod-name> -n <namespace>

as below:

kubectl scale --replicas=1 deployment dxi-postgresql -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment axaservices-amq -n<namespace>

Wait for below lines to appear in the pod log before starting the next pod:

kubectl scale --replicas=1 deployment axaservices-readserver -n<namespace>

This process can take several minutes to startup, wait for below lines to appear in the pod log before starting the next pod:

NOTE: you can ignore the below ERROR messages:

[EMMCacheRefresher,mdo-serverCacheRefreshCheck] ERROR [] - TID[14015724] 5271596: JMS Request processing timeout, please wait for background processing
com.ca.emm.corejsvr.ExceptionWithNC: 5271596: JMS Request processing timeout, please wait for background processing

..

2021-11-30 14:03:09,521 [EMMCacheRefresher,mdo-serverCacheRefreshCheck] ERROR [] - TID[14015724] 3011599: Internal Error: Unable to complete this cache refresh cycle: 5271596: JMS Request processing timeout, please wait for background processing

kubectl scale --replicas=1 deployment dxi-readserver -n<namespace>

This process can take several minutes to startup, wait for below lines to appear in the pod log before starting the next pod.

In addition search for: Successfully connected to tcp://axaservices-amq:61616