APM 24.1 index-rollover pod on various nodes (over time) is stuck on container creating or in limited cases it runs 30 minutes
search cancel

APM 24.1 index-rollover pod on various nodes (over time) is stuck on container creating or in limited cases it runs 30 minutes

book

Article ID: 376414

calendar_today

Updated On:

Products

DX Application Performance Management

Issue/Introduction

APM 24.1 index-rollover pod on various nodes (over time) is stuck on container Creating or in limited cases it runs 30 minutes.

1: Here is the output

kubectl get pods -n {namespace}|grep index-rollover
 
kubectl describe cronjob < index-rollover <number> -n {{namespace}}

$ k logs index-rollover-xxxx
Error from server (BadRequest): container "index-rollover" in pod "index-rollover-xxx" is waiting to start: ContainerCreating

2: From the corn job events

 Normal    JobAlreadyActive   cronjob/index-rollover              Not starting job because prior execution is running and concurrency policy is Forbid

3: From node/jarvis side

XXXX containerd[id]: time=<timeformat> level=error msg="ContainerStatus for \"zzzz1234\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"zzzz1234\": not found"
(END)

Environment

APM 24.1 platform version.

Resolution

Issue is with the index-roll over pod scheduled node ,some how node was not allowing to complete previous corn ,deleting the pod /assigned to some other scheduled node and worked fine. latter below identified with node as below.
 
A clash of /etc/exports vs /etc/fstab  mapping /nfs/ca/dxi. The server, previous the 21.3 NFS, was repurposed and performs role of worker node on new 24.1 environment. 
 
After removing  all /etc/exports, reboot, and kubelet and container is functioning properly.