If you encounter this issue, you’ll notice that the AKO pod on the management cluster enters a CrashLoopBackOff state after the upgrade. Upon inspecting the pod logs, you’ll see the following error:
2025-01-01T12:00:00.000Z [33mWARN[0m cache/controller_obj_cache.go:3067 Invalid input detected, AKO will be rebooted to retry fetching node network list failed with error: nodeNetworkList not set in values yaml, syncing will be disabled
2025-01-01T12:00:00.000Z [34mINFO[0m api/api.go:69 Shutting down the API server
2025-01-01T12:00:00.000Z [34mINFO[0m api/api.go:114 API server shutdown: http: Server closed
2025-01-01T12:00:00.000Z [31mERROR[0m k8s/ako_init.go:268 Error while validating input: fetching node network list failed with error: nodeNetworkList not set in values yaml, syncing will be disabled
2025-01-01T12:00:00.000Z [31mERROR[0m ako-main/main.go:321 Handle configmap error during reboot, shutting down AKO. Error is: sync is disabled because of configmap unavailability during bootup TKGm 2.5.3
This is a known issue with TKGm 2.5.3. It will be resolved in a future release.
NOTE: This workaround can be proactively applied before upgrading both management and workload clusters. Doing so will prevent the AKO pod from entering a CrashLoopBackOff (CLBO) state during the upgrade process.
To resolve this issue, we can modify the AKODeploymentConfig(ADC) objects on the management cluster.
From the management cluster context, list the existing AKODeploymentConfig (ADC) objects:
kubo@bAGXWtkIklzwT:~$ kubectl get adc -A
NAME AGE
install-ako-for-all 23h
install-ako-for-management-cluster 23h
install-ako-for-node-port-local 23h
Take a backup of the install-ako-for-management-cluster ADC
kubectl get adc install-ako-for-management-cluster -n tkg-system -oyaml > install-ako-for-management-cluster-backup.yaml
Edit it and add the correct node network to spec.extraConfigs.ingress.nodeNetworkList for the management cluster:
kubectl edit adc install-ako-for-management-cluster -n tkg-system
spec:
extraConfigs:
...
ingress:
...
nodeNetworkList:
- networkName: <NodeNetworkName>
Finally, delete the AKO pod and it should come back into a running state.
You can follow the same steps to resolve a crashing AKO pod on workload clusters. For the workload cluster, you will edit the install-ako-for-all ADC