DX OI - cpa-projection pod reporting CrashLoopBackOff status

book

Article ID: 224289

calendar_today

Updated On:

Products

DX Operational Intelligence DX Application Performance Management

Issue/Introduction

cpa-projection pod keeps crashing, scaling down/up the deployment doesn't help


kubectl -n dxi get po|grep cpa-projection

cpa-projection-65bf84d655-nb8l4                        0/1     CrashLoopBackOff   1          7s

 

kubectl -n dxi describe po cpa-projection-65bf84d655-nb8l4


QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m41s                  default-scheduler  Successfully assigned dxi/cpa-projection-65bf84d655-nb8l4 to dx02.dcma-ffm.local
  Normal   Pulled     2m13s (x5 over 3m40s)  kubelet            Container image "dx01.dcma-ffm.local:5000/dxi/doi-1.3.3-cpa-projection:22" already present on machine
  Normal   Created    2m13s (x5 over 3m40s)  kubelet            Created container cpa-projection
  Normal   Started    2m12s (x5 over 3m40s)  kubelet            Started container cpa-projection
  Warning  BackOff    105s (x10 over 3m38s)  kubelet            Back-off restarting failed container

 

Cause

NFS server's time was 10 minutes behind the other pods and syncing the time resolved the issue. 

CPA-Projection pod is responsible for calculation of the projections , which are in-turn displayed in the CPA landing page. To make sure this pod is up and running continuously, within the pod we have a demon process that monitors the application log and if it finds the application log has not been updated for the last 600 seconds, then the pod gets restarted.

Since the NFS server was 10 minutes behind, the last modified time of the files was always 600 seconds behind and this was causing the pod to go into a continuous crash loop.

Environment

DX Platform 20.2.x
DX Operational Intelligence 20.2.x

Resolution

Make sure NFS and OCS/k8s servers' time are in-synch.

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815