AKO-0 Pod Crashing After Management Cluster Upgrade to TKGm v2.5.3
search cancel

AKO-0 Pod Crashing After Management Cluster Upgrade to TKGm v2.5.3

book

Article ID: 400111

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

If you encounter this issue, you’ll notice that the AKO pod on the management cluster enters a CrashLoopBackOff state after the upgrade. Upon inspecting the pod logs, you’ll see the following error:

2025-01-01T12:00:00.000Z	WARN	cache/controller_obj_cache.go:3067	Invalid input detected, AKO will be rebooted to retry fetching node network list failed with error: nodeNetworkList not set in values yaml, syncing will be disabled
2025-01-01T12:00:00.000Z	INFO	api/api.go:69	Shutting down the API server
2025-01-01T12:00:00.000Z	INFO	api/api.go:114	API server shutdown: http: Server closed
2025-01-01T12:00:00.000Z	ERROR	k8s/ako_init.go:268	Error while validating input: fetching node network list failed with error: nodeNetworkList not set in values yaml, syncing will be disabled
2025-01-01T12:00:00.000Z	ERROR	ako-main/main.go:321	Handle configmap error during reboot, shutting down AKO. Error is: sync is disabled because of configmap unavailability during bootup 

Environment

TKGm 2.5.3

Cause

This is a known issue with TKGm 2.5.3. It will be resolved in a future release. 

Resolution

NOTE: This workaround can be proactively applied before upgrading both management and workload clusters. Doing so will prevent the AKO pod from entering a CrashLoopBackOff (CLBO) state during the upgrade process.

 

To resolve this issue, we can modify the AKODeploymentConfig(ADC) objects on the management cluster. 

From the management cluster context, list the existing AKODeploymentConfig (ADC) objects:

kubo@bAGXWtkIklzwT:~$ kubectl get adc -A
NAME                                 AGE
install-ako-for-all                  23h
install-ako-for-management-cluster   23h
install-ako-for-node-port-local      23h

Take a backup of the install-ako-for-management-cluster ADC

kubectl get adc install-ako-for-management-cluster -n tkg-system -oyaml > install-ako-for-management-cluster-backup.yaml

Edit it and add the correct node network to spec.extraConfigs.ingress.nodeNetworkList for the management cluster:

kubectl edit adc install-ako-for-management-cluster -n tkg-system

spec:
  extraConfigs:
...
    ingress:
...
      nodeNetworkList:
      - networkName: <NodeNetworkName>

Finally, delete the AKO pod and it should come back into a running state.

You can follow the same steps to resolve a crashing AKO pod on workload clusters. For the workload cluster, you will edit the install-ako-for-all ADC