cpa-projection pod keeps crashing, scaling down/up the deployment doesn't help
kubectl -n dxi get po|grep cpa-projection
cpa-projection-65bf84d655-nb8l4 0/1 CrashLoopBackOff 1 7s
kubectl -n dxi describe po cpa-projection-65bf84d655-nb8l4
QoS Class: Burstable
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m41s default-scheduler Successfully assigned dxi/cpa-projection-65bf84d655-nb8l4 to dx02.dcma-ffm.local
Normal Pulled 2m13s (x5 over 3m40s) kubelet Container image "dx01.dcma-ffm.local:5000/dxi/doi-1.3.3-cpa-projection:22" already present on machine
Normal Created 2m13s (x5 over 3m40s) kubelet Created container cpa-projection
Normal Started 2m12s (x5 over 3m40s) kubelet Started container cpa-projection
Warning BackOff 105s (x10 over 3m38s) kubelet Back-off restarting failed container
NFS server's time was 10 minutes behind the other pods and syncing the time resolved the issue.
CPA-Projection pod is responsible for calculation of the projections , which are in-turn displayed in the CPA landing page. To make sure this pod is up and running continuously, within the pod we have a demon process that monitors the application log and if it finds the application log has not been updated for the last 600 seconds, then the pod gets restarted.
Since the NFS server was 10 minutes behind, the last modified time of the files was always 600 seconds behind and this was causing the pod to go into a continuous crash loop.
DX Platform 20.2.x
DX Operational Intelligence 20.2.x
Make sure NFS and OCS/k8s servers' time are in-synch.
DX AIOPs - Troubleshooting, Common Issues and Best Practices