search cancel

AIOPs 20.2.1 - NFS disk full due to Data Science Platform (DSP) feature

book

Article ID: 233929

calendar_today

Updated On:

Products

DX Operational Intelligence DX Application Performance Management

Issue/Introduction

NFS disk full due to Data Science Platform (DSP) feature, what data or files can I delete?

Cause

Troubleshooting

1) Find out which data is consuming the most

du -shc <dxi_nfs>/* | sort -h

7.5G /var/nfs/doi/em
17G /var/nfs/doi/doiservices
97G /var/nfs/doi/apmservices
1.8T /var/nfs/doi/axaservices
1.9T total

du -shc <dxi_nfs>/doi/axaservices/* | sort -h 

4.0K /var/nfs/doi/axaservices/adminui-data
4.0K /var/nfs/doi/axaservices/amq-data
156M /var/nfs/doi/axaservices/dxi-notify
3.5G /var/nfs/doi/axaservices/dxi-adminui
4.5G /var/nfs/doi/axaservices/dxi-readserver
1.8T /var/nfs/doi/axaservices/pg-data
1.8T total

the above results indicate that postgres is consuming the space

 

2) connect to postgres db and list the database sizes:

a) Login to postgres pod

kubectl get pods -n<namespace> | grep postgres

kubectl exec -it <posgres-pod> -n<namespace> bash

b) run psql from terminal

$ psql

c) list databases ( will include the sizes for the databases)

postgres=# \l+

 

The Data Science Platform (DSP) feature for anomaly detection had processing issues and is now replaced with anomalydetection pod. 

 

Environment

DX Operational Intelligence 20.2.1 only

Resolution

Fixed in 21.3.1. The dspintegrator is no longer used and has been removed from 21.3.1 onwards.

 

Workaround:

Stop DSP feature and delete its databases as below:

STEP#1:  Scale down the 'dspintegrator' deployment to 0 to keep from adding data to the database again:

kubectl scale --replicas=0 deployment doi-dspcasa -n<namespace>
kubectl scale --replicas=0 deployment doi-dspcasa1 -n<namespace>
kubectl scale --replicas=0 deployment doi-dspintegrator -n<namespace>

 

STEP#2: Backup the required (good) databases to have references should any issue occur so we can restore if necessary:

1) open dxi-postgresql terminal

kubectl get pods -n<namespace> | grep postgres
kubectl exec -it <posgres-pod> -n<namespace> bash

2) cd $PGDATA/
3) mkdir ./postgres_backup
4) verify the above 'postgres_backup' directory exists on the NFS.  Feel free to copy out of the volume mount to some other location for double-backup.
5) cd ./postgres_backup
6) Run the following to backup each required database (from within the 'postgres_backup' directory):

$ pg_dump --create -f ./aoplatform.dump aoplatform
$ pg_dump --create -f ./apmpe.dump apmpe
$ pg_dump --create -f ./cpa.dump cpa
$ pg_dump --create -f ./doi.dump doi
$ pg_dump --create -f ./dxi.dump dxi
$ pg_dump --create -f ./grafana_db.dump grafana_db

IMPORTANT NOTE: It is critical that these backups are taken, as we can restore the necessary databases in a new postgres instance from these if absolutely required.

 

STEP#3: Delete DSP databases

drop database dsp_db;
drop database dspintegrator_db;

 
This may take a while.  If it returns with message of 'DROP DATABASE' then you'll have dropped the offending database:
 
> \l
 
will no longer show dsp_db if you received the 'DROP DATABASE' response.
 
 
If a catastrophic error occurs, then we can recreate the good databases by performing:
 
1) scale all deployments (including dxi-postgresql) deployment to 0.
2) open shell and navigate to NFS
2) cd /<your_dxi_nfs>/axaservices/pg-data/
3) mv ./userdata ./userdata.old01
4) cp -r ./dxi ./dxi.old01
5) start (scale up) dxi-postgresql pod in a separate terminal or from the browser
6) still in the same directory as before, performing 'ls' command will show newly created userdata and dxi dirs.
7) exec into the dxi-postgresql pod
kubectl get pods -n<namespace> | grep postgres
kubectl exec -it <posgres-pod> -n<namespace> bash
8) cd $PGDATA
9) running psql should only show postgres, template0, and template1 databases:
$ psql --list
10) restore the individual databases:
$ cd ./postgres_backup;
$ psql < ./aoplatform.dump
$ psql < ./apmpe.dump
$ psql < ./cpa.dump
$ psql < ./doi.dump
$ psql < ./dxi.dump
$ psql < ./grafana_db.dump

 

 

 

Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html#mcetoc_1f7qcopf91ur

Attachments