Symptoms
# kubectl -n avi-system logs ako-0
2024-07-08T08:23:03.451Z WARNm rest/rest_operation.go:304 key: admin/global, msg: RestOp method PATCH path /api/vrfcontext/vrfcontext-d19e04cc-a0bf-45e6-9557-f579ff6494f5 tenant admin Obj {"static_routes":[{"next_hop":{"addr":"192.168.13.11","type":"V4"}...},.... returned err {"code":0,"message":"map[error:Overlapping routes found]","Verb":"PATCH","Url":"https://####//api/vrfcontext/vrfcontext-d19e04cc-a0bf-45e6-9557-f579ff6494f5","HttpStatusCode":400} with response null
AKO can't register a new route against the ALB controller because old routing entries already exist with the wrong label. As a result, AVI-SE can't install the correct routing information from the ALB controller.
1. Clean up the old routing entries based on k8s node IP address
2. Restart the AKO pod, after that AKO will recreate fresh routing entries
kubectl -n avi-system delete pod ako-0
kubectl -n avi-system get pods # Status: running
kubectl -n avi-system logs ako-0 # Check no ERROR
3. Check VIP access
Troubleshooting Step
Check the target node IP address
kubectl get nodes -owide
#> NAME STATUS INTERNAL-IP
#> workload-cluster-A-75hv8-4qpvk Ready 192.168.13.11
#> workload-cluster-A-md-0-92b72-7c67d684c-57rhn Ready 192.168.13.47
#> workload-cluster-A-md-0-92b72-7c67d684c-jtrms Ready 192.168.13.51
#> workload-cluster-A-md-0-92b72-7c67d684c-nmzmp Ready 192.168.13.56
Check the current AKO configuration
kubectl -n avi-system get cm avi-k8s-config -oyaml | grep -E 'cloudName:|clusterName:|cniPlugin:|controllerVersion:|serviceEngineGroupName:|serviceType:'
#> cloudName: Default-Cloud
#> clusterName: workload-cluster-A
#> cniPlugin: antrea
#> controllerVersion: 22.1.2
#> serviceEngineGroupName: test-seg
#> serviceType: ClusterIP
Check the routing table in the AVI-SE
ssh admin@$AVI_SE_IPADDR
sudo ip netns list
#> avi_ns1 (id: 0)
#> avi_poll_ns1
#> avi_poll_ns2
sudo ip netns exec avi_ns1 ip route
#> 100.96.0.0/24 via 192.168.13.11 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> 100.96.1.0/24 via 192.168.13.51 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> 100.96.2.0/24 via 192.168.13.47 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> 100.96.3.0/24 via 192.168.13.56 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
Check the routing information in the ALB-controller
ssh admin@$ALB_CONTROLLER_IPADDR
admin@avi-controller:~$ shell
Login: admin
Password:
[admin:avi-controller]: > show vrfcontext global
Multiple objects found for this query.
[0]: vrfcontext-f790b561-e7f2-49ee-b08c-425aac54e286#global in tenant admin, Cloud Default-Cloud
[1]: vrfcontext-69cacb9b-c92f-4432-b038-57a4ce1196d2#global in tenant admin, Cloud TEST-CLOUD
Select one: 0 # <------ Choose the target Cloud number
+------------------+-------------------------------------------------+
| Field | Value |
+------------------+-------------------------------------------------+
| uuid | vrfcontext-f790b561-e7f2-49ee-b08c-425aac54e286 |
| name | global |
| static_routes[1] | |
| prefix | 100.96.0.0/24 |
| next_hop | 192.168.13.11 |
| route_id | workload-cluster-A-1 |
| labels[1] | |
| key | clustername |
| value | workload-cluster-A | # CHECK: Target cluster name corresponds with "clusterName"?
| static_routes[2] | |
| prefix | 100.96.2.0/24 |
| next_hop | 192.168.13.47 | # CHECK: Is there any duplicated next_hop?
| route_id | workload-cluster-A-2 |
| labels[1] | |
| key | clustername |
| value | workload-cluster-A |
| static_routes[3] | |
| prefix | 100.96.1.0/24 |
| next_hop | 192.168.13.51 |
| route_id | workload-cluster-A-3 |
| labels[1] | |
| key | clustername |
| value | workload-cluster-A |
| static_routes[4] | |
| prefix | 100.96.3.0/24 |
| next_hop | 192.168.13.56 |
| route_id | workload-cluster-A-4 |
| labels[1] | |
| key | clustername |
| value | workload-cluster-A |
| system_default | True |
| lldp_enable | True |
| tenant_ref | admin |
| cloud_ref | Default-Cloud |
+------------------+-------------------------------------------------+