AKO pod in CrashLoopBackOff state - recreating every 5 minutes
search cancel

AKO pod in CrashLoopBackOff state - recreating every 5 minutes

book

Article ID: 397598

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Tanzu Kubernetes Runtime VMware Avi Load Balancer VMware NSX Advanced Load Balancer

Issue/Introduction

  • Symptoms: 
  • When you run the command kubectl get pods -n vmware-system-ako  - you see:
    • vmware-system-ako  ako-controller-manager-########  1/2     CrashLoopBackOff  
  • Pods logs are showing: 
    • "infra" container within pod:

      E0331 15:46:32.690504       1 avisession.go:668] Client error for URI: login. Error: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      E0331 15:46:32.691096       1 avisession.go:714] CheckControllerStatus is disabled for this session, not going to retry.

      E0331 15:46:32.691170       1 avisession.go:716] Failed to invoke API. Error: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      E0331 15:46:32.691215       1 avisession.go:383] response error: Rest request error, returning to caller: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      2025-03-31T15:46:32.691Z        ERROR   ingestion/vcf_k8s_controller.go:381     Failed to connect to AVI controller using secret provided by NCP, the secret would be deleted, err: Rest request error, returning to caller: Post "https://XX.XX.XX.XXX/login": x509: certificate signed by unknown authority


      "manager" container within pod:

      E0331 15:47:32.269824       1 avisession.go:668] Client error for URI: login. Error: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      E0331 15:47:32.271411       1 avisession.go:714] CheckControllerStatus is disabled for this session, not going to retry.

      E0331 15:47:32.271428       1 avisession.go:716] Failed to invoke API. Error: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      E0331 15:47:32.271495       1 avisession.go:383] response error: Rest request error, returning to caller: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      2025-03-31T15:47:32.271Z        ERROR   k8s/ako_init.go:384     AVI controller initialization failed with err: Rest request error, returning to caller: Post "https://##.##.##.###/login": x509: certificate signed by unknown authority

      2025-03-31T15:47:32.278Z        INFO    lib/dynamic_client.go:377       init Secret not found, retrying...

      2025-03-31T15:47:37.280Z        INFO    lib/dynamic_client.go:377       init Secret not found, retrying...

      2025-03-31T15:47:42.283Z        FATAL   lib/dynamic_client.go:374       Found new init secret, rebooting AKO

       

  • We might also see nsxt-alb account is getting locked out on the Avi controller

Environment

TKGS 8.x

Cause

The AKO pod is expecting a secret to be there, but in this case is not there. Because of this, the AKO pod continues to try and recreate. 

The NCP pod is what is in charge of creating this secret, and for some reason, depending on your environment, has failed to do so.

 

 

Resolution

Delete the NCP pods and have them recreate: 


kubectl -n vmware-system-nsx get pods
 
NAME                       READY   STATUS    RESTARTS   AGE
nsx-ncp-<pod1-ID>  2/2     Running   0          49d
nsx-ncp-<pod2-ID>   2/2     Running   0          49d
 
kubectl -n vmware-system-nsx delete pod nsx-ncp-<pod1-ID>
kubectl -n vmware-system-nsx delete pod nsx-ncp-<pod2-ID>


After the NCP pods are regenerated, an avi-init-secret should appear or be recreated and will trigger a reboot of the AKO service.



If this fails or have any issue with the NCP pods coming back up please reach out to Support for assistance.