kubectl get pods -n pinniped-supervisor
NAMESPACE NAME READY STATUS
pinniped-supervisor pinniped-post-deploy-job-ver-1-264sn 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-4fvkj 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-88s9q 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-b6frc 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-h4vwd 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-ks8t2 0/1 Error
pinniped-supervisor pinniped-post-deploy-job-ver-1-mxfn8 0/1 Error
pinniped-supervisor pinniped-supervisor-56fbb8cffd-65k2g 1/1 Running
To find the root cause of the problem you can use the following commands
kubectl get jobs -n pinniped-supervisor
NAME COMPLETIONS DURATION AGE
pinniped-post-deploy-job-ver-1 0/1 33m 33m
kubectl describe jobs pinniped-post-deploy-job-ver-1 -n pinniped-supervisor
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 30m job-controller Created pod: pinniped-post-deploy-job-ver-1-mxfn8
Normal SuccessfulCreate 19m job-controller Created pod: pinniped-post-deploy-job-ver-1-264sn
<-------TRUNCATED--------->
Warning BackoffLimitExceeded 2m38s job-controller Job has reached the specified backoff limit
From the above output, you can see that the job has exceeded its backoff limit. The back-off limit is set by default to 6. Failed pods associated with a job are recreated by the job-controller with an exponential back-off delay (10s, 20s, 40s ...) capped at 6 minutes. The back-off count is reset when a job's pod is deleted or successful without any other pods for the job failing around that time. You can read more about Jobs and back off failure policy in Kubernetes documentation.
This can happen in a scenario where your pods under the pinniped-supervisor namespace were not in a Ready status or took a long time to become healthy. Before applying the resolution to this problem make sure your pinniped-supervisor and pinniped-concierge namespaces have pods in Ready status.
You can fix this problem by deleting the pinniped-post-deploy-job under the pinniped-supervisor namespace once you have resolved the issues with pinniped deployment. You can follow the steps below to delete the job and monitor the pinniped app's successful reconciliation.
kubectl get app -n tkg-system pinniped
NAME DESCRIPTION SINCE-DEPLOY AGE
pinniped Reconcile failed: Deploying: exit status 1 4m43s 49m
kubectl delete jobs.batch -n pinniped-supervisor pinniped-post-deploy-job
job.batch "pinniped-post-deploy-job" deleted
The app object reconciliation is done by the kapp-controller every 5 minutes so you may have to wait for some time before the reconciliation kicks off. Once the app has started reconciling you should see an out similar to the ones highlighted below
kubectl get app -n tkg-system pinniped
NAME DESCRIPTION SINCE-DEPLOY AGE
pinniped Reconciling 5s 49m
kubectl get app -n tkg-system pinniped
NAME DESCRIPTION SINCE-DEPLOY AGE
pinniped Reconcile succeeded 47s 50m
kubectl get jobs -n pinniped-supervisor
NAME COMPLETIONS DURATION AGE
pinniped-post-deploy-job-ver-1 1/1 9s 62s
kubectl get pods -n pinniped-supervisor
NAME READY STATUS RESTARTS AGE
pinniped-post-deploy-job-ver-1-fdk6n 0/1 Completed 0 70s
pinniped-supervisor-56fbb8cffd-6b8bh 1/1 Running 0 65s
pinniped-supervisor-56fbb8cffd-dkt6c 1/1 Running 0 65s