DX OI - DSP integrator is running at almost 200 % CPU and full memory usage
search cancel

DX OI - DSP integrator is running at almost 200 % CPU and full memory usage

book

Article ID: 221548

calendar_today

Updated On:

Products

DX Operational Intelligence DX Application Performance Management CA App Experience Analytics

Issue/Introduction

From DX Platform, Cluster Manager we can DSP integrator is running at almost 200 % CPU and full memory usage
 

 

Environment

DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x

Cause

In dspintegrator log below message is found 

WARN  DspProcessorThread:postCsvInputDataToDspcasa:319 [dspi-dspprocessor-thread-3]  - failure count exceeded

The failure count specification in the postCsv call is a result of dspintegrator not communicating with the dspcasa1-scoring server. 

This is an indication that was an issue with communication to postgres database that is impacting it from handling requests

 

Resolution

1) Login to Kubernetes master

2) scale down dspintegrator, dspcasa1, dspcasa at once:

kubectl scale --replicas=0 deployment doi-dspintegrator -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa1 -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa -n<namespace>

3) verify that all above pods have been stopped:

kubectl get pods-n<namespace> | grep dsp

4) scale up  dspintegrator, dspcasa1, dspcasa 1 by 1, wait for 2 to 3 minutes before starting next pod to ensure each pod starts successfully

Tip: check the pod logs using : kubectl logs <pod-name> -n<namespace>

kubectl scale --replicas=0 deployment doi-dspcasa -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa1 -n<namespace>
kubectl scale --replicas=0 deployment doi-dspintegrator -n<namespace>

5) Login to Cluster Management as masteradmin and verify that DSP health is back to normal


If the problem persists, collect below information to troubleshoot issues related to DSP and contact Broadcom Support

1) dspintegrator and posgres logs:

<nfs>/ca/dxi/doiservices/dspintegrator/*
<nfs>ca/dxi/axaservices/pg-data/userdata/pg_log/*
 
2) collect logs from dspcasa pods
 
a)
kubectl get pods -ndxi | grep dsp
kubectl logs -f <doi-dspcasa-pod>

b)
kubectl exec -ti <doi-dspcasa-pod> bash -ndxi 
cd /opt/dsp/dsp_logs/

collect all files.
 
3) status of kubernetes setup
 
kubectl describe nodes
 
4) List of indices sorted by size
 
http(s)://<ELASTIC_URL>/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://<ELASTIC_URL>/_cat/indices?v

5) Size of Postgres databases:

a) obtain the postgres pod name
 
example:
kubectl get pods -ndxi | grep post
postgresql-77c878cc47-76hwm                          1/1       Running       0          26s

b) login to pod
 
example:
kubectl exec -it postgresql-77c878cc47-76hwm -ndxi bash

c)  list of databases by size:
 
psql -U aopuser -d aoplatform

SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;
for example:
 database_name   | size_in_mb
------------------+------------
 dsp_db           |       4189
 aoplatform       |        554
 dspintegrator_db |         79
 doi              |          8
 apmpe            |          6
 grafana_db       |          6
 postgres         |          6
 dxi              |          6
 template1        |          6
 template0        |          6
 cpa              |          6
(11 rows) 
 
psql -U aopuser -d dsp_db
select * from dsp_operation_status
 
example:

 
 

Additional Information

DX AIOPs Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815