DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x
In dspintegrator log below message is found
WARN DspProcessorThread:postCsvInputDataToDspcasa:319 [dspi-dspprocessor-thread-3] - failure count exceeded
The failure count specification in the postCsv call is a result of dspintegrator not communicating with the dspcasa1-scoring server.
This is an indication that was an issue with communication to postgres database that is impacting it from handling requests
1) Login to Kubernetes master
2) scale down dspintegrator, dspcasa1, dspcasa at once:
kubectl scale --replicas=0 deployment doi-dspintegrator -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa1 -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa -n<namespace>
3) verify that all above pods have been stopped:
kubectl get pods-n<namespace> | grep dsp
4) scale up dspintegrator, dspcasa1, dspcasa 1 by 1, wait for 2 to 3 minutes before starting next pod to ensure each pod starts successfully
Tip: check the pod logs using : kubectl logs <pod-name> -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa1 -n<namespace>
kubectl scale --replicas=0 deployment doi-dspintegrator -n<namespace>
5) Login to Cluster Management as masteradmin and verify that DSP health is back to normal
If the problem persists, collect below information to troubleshoot issues related to DSP and contact Broadcom Support
database_name | size_in_mb------------------+------------dsp_db | 4189aoplatform | 554dspintegrator_db | 79doi | 8apmpe | 6grafana_db | 6postgres | 6dxi | 6template1 | 6template0 | 6cpa | 6(11 rows)
DX AIOPs Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815