DX AIOPs - Unable to start the DX platform services , many pods in CrashLoopBackOff  and "Permission denied" in many pod logs
search cancel

DX AIOPs - Unable to start the DX platform services , many pods in CrashLoopBackOff  and "Permission denied" in many pod logs

book

Article ID: 273335

calendar_today

Updated On:

Products

DX Operational Intelligence DX Application Performance Management

Issue/Introduction

Our NFS server ran out of disk space, actions were performed to clear some space, then we tried to restart dx services but unable to do so, many pods are in CrashLoopBackOff  and in the pod logs, we can see the "Permission denied" message as below:

 

kubectl  logs -ndxi dxi-postgresql-66fc6b647-q8npb

chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not permitted

$ kubectl logs -ndxi apmservices-gateway-001-5f5984dccc-l4qzx

[Saas] MALLOC_CONF: narenas:16,lg_tcache_max:13

[Saas] Jemalloc enabled: /usr/local/jemalloc5.2.1/lib/libjemalloc.so.2

'secrets/systempublic.pem' -> '/apmservices.sec/systempublic.pem'

'secrets/application-secrets.properties' -> '/apmservices.sec/bootstrap.properties'

[Saas] No debug commands are provided. Use 'APM_DEBUG_ENABLED' for JVM debug

[Saas] Or pass your own commands using 'APM_DEBUG_CUSTOM' param

[Saas] Detected APM java opts: -Xlog:gc:logs/gc.log::filecount=10,filesize=10M -XX:ErrorFile=logs/hs_err_pid-2023-09-11-1694442687.log -Xbootclasspath/a:/opt/jdk/lib/bc-fips/bc-fips.jar:/opt/jdk/lib/bc-fips/bctls-fips.jar -Xms2048m -Xmx2048m -Xss512k -XX:+UseG1GC -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -Djava.security.egd=file:/dev/random -Dcom.ca.apm.common.crypto.fipsEnable=true

Could not rename log file 'logs/gc.log' to 'logs/gc.log.7' (Permission denied).

Invalid -Xlog option '-Xlog:gc:logs/gc.log::filecount=10,filesize=10M', see error log for details.

Error: Could not create the Java Virtual Machine.

Error: A fatal exception has occurred. Program will exit.

[0.006s][error][logging] Error opening log file 'logs/gc.log': Permission denied

[0.006s][error][logging] Initialization of output 'file=logs/gc.log' using options 'filecount=10,filesize=10M' failed.

 

kubectl logs -ndxi apmservices-tas-001-77f5c47d7-47rt7

[Saas] MALLOC_CONF: narenas:16,lg_tcache_max:13

[Saas] Jemalloc enabled: /usr/local/jemalloc5.2.1/lib/libjemalloc.so.2

'secrets/systempublic.pem' -> '/apmservices.sec/systempublic.pem'

'secrets/application-secrets.properties' -> '/apmservices.sec/bootstrap.properties'

[Saas] No debug commands are provided. Use 'APM_DEBUG_ENABLED' for JVM debug

[Saas] Or pass your own commands using 'APM_DEBUG_CUSTOM' param

[Saas] Detected APM java opts: -Xlog:gc:logs/gc.log::filecount=10,filesize=10M -XX:ErrorFile=logs/hs_err_pid-2023-09-11-1694442777.log -Xbootclasspath/a:/opt/jdk/lib/bc-fips/bc-fips.jar:/opt/jdk/lib/bc-fips/bctls-fips.jar -Xms2048m -Xmx2048m -Xss512k -XX:+UseG1GC -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -Djava.security.egd=file:/dev/random -Dcom.ca.apm.common.crypto.fipsEnable=true

Could not rename log file 'logs/gc.log' to 'logs/gc.log.0' (Permission denied).

Invalid -Xlog option '-Xlog:gc:logs/gc.log::filecount=10,filesize=10M', see error log for details.

Error: Could not create the Java Virtual Machine.

Error: A fatal exception has occurred. Program will exit.

[0.004s][error][logging] Error opening log file 'logs/gc.log': Permission denied

[0.004s][error][logging] Initialization of output 'file=logs/gc.log' using options 'filecount=10,filesize=10M' failed.

 

kubectl get pods -ndxi|grep -v Running|grep -v Completed

NAME                                                   READY   STATUS              RESTARTS   AGE

apmservices-apmbacking-singleton-56f465755d-dgts5      0/1     CrashLoopBackOff    5          3m54s

apmservices-at-001-57c4d8ddfc-nvpkr                    0/1     CrashLoopBackOff    5          3m54s

apmservices-atc-001-bc9ccd885-m8h4b                    0/1     CrashLoopBackOff    5          3m54s

apmservices-blobstorage-001-5868877fcc-tc5q6           0/1     CrashLoopBackOff    5          3m53s

apmservices-cloudgw-001-5774cb8957-n5bbd               0/1     CrashLoopBackOff    5          3m53s

apmservices-gateway-001-5f5984dccc-l4qzx               0/1     CrashLoopBackOff    5          3m53s

apmservices-manager-001-8576f56b54-q446m               0/1     CrashLoopBackOff    5          3m53s

apmservices-metadata-001-879d9bf8d-829c6               0/1     CrashLoopBackOff    5          3m53s

apmservices-metricalert-001-74f6b5f859-w2p9t           0/1     CrashLoopBackOff    5          3m53s

apmservices-metricforward-001-6c58dd7977-clb45         0/1     CrashLoopBackOff    5          3m53s

apmservices-metrics-001-686c597c5f-s9hct               0/1     CrashLoopBackOff    5          3m52s

apmservices-metricsorter-001-777f46766b-z8z8b          0/1     CrashLoopBackOff    5          3m52s

apmservices-nass-001-568488996d-4nv2q                  0/1     CrashLoopBackOff    5          3m52s

apmservices-states-001-77bbfb45b7-dqvwb                0/1     CrashLoopBackOff    5          3m52s

apmservices-tas-001-77f5c47d7-47rt7                    0/1     CrashLoopBackOff    5          3m51s

apmservices-tenants-singleton-85d75b6b4f-jgcc6         0/1     CrashLoopBackOff    5          3m51s

apmservices-zookeeper-cc6ddb97d-w66bf                  0/1     CrashLoopBackOff    5          3m51s

axaservices-ba-routing-service-695d95cfc7-ghgjs        0/1     Init:0/1            0          3m51s

axaservices-dxc-7495fd566c-c7m6v                       0/1     Init:0/1            0          3m51s

axaservices-ngutils-764cdbdbbc-nlrjk                   0/1     Init:0/1            0          3m50s

axaservices-readserver-5b9f74979-85w6t                 0/1     Init:0/3            0          3m50s

doi-adminui-7fc8f97965-l2bvt                           0/1     Init:0/4            0          3m50s

doi-automic-integration-656d4bdb87-ncl9m               0/1     Init:0/3            0          3m49s

doi-cpa-ng-6746487b88-rv655                            0/1     Init:1/2            0          3m49s

doi-cpa-service-aggregation-6f789fc568-88n8m           0/1     Init:1/2            0          3m49s

doi-incidentmanagement-7fc4c7bd46-nzrfx                0/1     Init:0/2            0          3m48s

doi-integrationgateway-75df6ff44f-4ct25                0/1     Init:0/1            0          3m48s

doi-logcollector-6cc7bbffd6-4pwkx                      0/1     Error               5          3m48s

doi-maintenance-service-54595cf4cc-8lvt6               0/1     CrashLoopBackOff    5          3m48s

doi-metric-config-service-58b775b8c4-prhkh             0/1     Init:0/1            0          3m48s

doi-nim-686d9c6c6c-lwxvx                               0/1     ImageInspectError   0          3m47s

doi-normalized-alarm-6b885b8b-plkld                    0/1     Init:1/2            0          3m47s

doi-pi-projection-ddf6f845-rjkmq                       0/1     Init:1/2            0          3m46s

doi-platelemetry-78b7b84ddd-vvjjv                      0/1     Init:0/1            0          3m46s

doi-servicealarm-65b6c87f44-h6l4s                      0/1     Init:1/3            0          3m46s

doi-servicemanagement-68d74fcbc9-qf7dz                 0/1     CrashLoopBackOff    5          3m46s

doi-servicemetrics-7d9d989fb4-tstkt                    0/1     CrashLoopBackOff    5          3m46s

doi-servicerepo-5dbb98cffd-k7xrt                       0/1     CrashLoopBackOff    5          3m45s

doi-servicestatemanager-78bb779b47-xjvzh               0/1     CrashLoopBackOff    5          3m45s

doi-situations-0                                       0/1     Init:1/2            0          3m48s

doi-tenantmanagement-5d59bb4878-6d2gs                  0/1     Init:0/1            0          3m45s

doireadserver-5ff54d8bdb-k96fk                         0/1     Init:0/2            0          3m45s

dxi-adminui-77bcbcf856-k975z                           0/1     Init:0/3            0          3m44s

dxi-grafana-deployment-5945d6c4f6-wvtl9                0/1     Init:0/1            0          3m44s

dxi-grafana-image-renderer-6685755fcf-lwz58            0/1     Init:0/1            0          3m44s

dxi-grafana-reporter-687cbd4fc6-67f8v                  0/1     Init:0/1            0          3m44s

dxi-grafana-services-66bb9d6c64-4zbjx                  0/1     Init:0/1            0          3m44s

dxi-notify-6bd5795c66-jg6kb                            0/1     Init:0/2            0          3m44s

dxi-postgresql-66fc6b647-q8npb                         0/1     CrashLoopBackOff    5          3m43s

dxi-readserver-f5cbc7945-2lqk5                         0/1     Init:0/3            0          3m43s

jarvis-kafka-2-646fb5dc77-6cs9f                        0/1     CrashLoopBackOff    4          3m42s

jarvis-kafka-3-7f6f4f9867-qxj7f                        0/1     CrashLoopBackOff    4          3m42s

jarvis-kafka-65877d8cf8-chf52                          0/1     CrashLoopBackOff    4          3m42s

Environment

DX Platform 21.x

Cause

Incorrect permission or ownership assigned to the dxi NFS folders.

Resolution

Make sure to apply the correct ownership and permissions to the dxi NFS folders as per documentation:

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/21-3/installing/Installation-Scenarios-2131/Install-as-a-Cluster-Reader/Kubernetes---Pre-installation-Tasks-Cluster-Reader.html#concept.dita_12c36779-2050-4df9-b2cf-3986fe434b39_CreateDirectories 

Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html