DX AIOps - apmservices-nass pod restarting : org.rocksdb.RocksDBException: Sst file size mismatch: ./data/nass_spooldb
search cancel

DX AIOps - apmservices-nass pod restarting : org.rocksdb.RocksDBException: Sst file size mismatch: ./data/nass_spooldb

book

Article ID: 255648

calendar_today

Updated On:

Products

DX Application Performance Management DX Operational Intelligence DX Operational Intelligence

Issue/Introduction

apm services-nass pod is restarting so performance metrics and inventory information are not accessible

from apmservices-nass-pode.txt:

2022-12-06 08:57:43.349  WARN 1 --- [           main] onfigReactiveWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'schedulers': Unsatisfied dependency expressed through field 'metricStore'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'metricStore': Unsatisfied dependency expressed through field 'spoolManager'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'spoolManager': Unsatisfied dependency expressed through field 'familyManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'familyManager': Invocation of init method failed; nested exception is com.ca.apm.common.db.DatabaseException: java.io.IOException: org.rocksdb.RocksDBException: Sst file size mismatch: ./data/nass_spooldb/4240410.sst. Size recorded in manifest 83239, actual size 1080

..

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2022-12-06 08:57:43.660 ERROR 1 --- [           main] o.s.boot.SpringApplication               : Application run failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'schedulers': Unsatisfied dependency expressed through field 'metricStore'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'metricStore': Unsatisfied dependency expressed through field 'spoolManager'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'spoolManager': Unsatisfied dependency expressed through field 'familyManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'familyManager': Invocation of init method failed; nested exception is com.ca.apm.common.db.DatabaseException: java.io.IOException: org.rocksdb.RocksDBException: Sst file size mismatch: ./data/nass_spooldb/4240410.sst. Size recorded in manifest 83239, actual size 1080

        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:660)
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:640)
        at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)
        

Environment

DX Operational Intelligence 21.3.1
DX APM 21.3.1
DX Platform 21.3.1

Cause

This exception basically says that the database is corrupted. It could only be the Spool DB which contains only the last couple of minutes.

There are many cases of how it can happen, but mostly it is related to NFS connection issues and use of the old NFS version (v3). Version 4.1 is recommended as one that is much more resilient as per documentation:

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/21-3/installing/Hardware-software-requirements.html

Resolution

1. Scale down apmservices-nass-001 deployment
One can use DX manager pod to cleanup DB:

2. Identify DX manager pod and rsh there (oc or kubectl):

Example: oc rsh apmservices-manager-001-7588688454-9cqqd

3. Backup nass spool for analysis:

cd /data.all/apmservices/nass-001/data/

tar cvzf spooldbbackup.tar.gz nass_spooldb/

4. Cleanup spool db:
rm nass_spooldb/*

5. Scale up apmservices-nass-001 deployment

Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html