NSX-ALB VIP health-check failure with "Overlapping routes found" when using ClusterIP mode in TKG
search cancel

NSX-ALB VIP health-check failure with "Overlapping routes found" when using ClusterIP mode in TKG

book

Article ID: 372626

calendar_today

Updated On: 02-28-2025

Products

VMware Tanzu Kubernetes Grid VMware NSX Advanced Load Balancer

Issue/Introduction

  • VIP access fails because AVI health check has already failed
  • When switching to NodePort access, it works well
  • Static Route entries seem to exist in ALB controller UI
  • AKO pod log says "error: Overlapping routes found"
# kubectl -n avi-system logs ako-0
2024-07-08T08:23:03.451Z WARNm	rest/rest_operation.go:304	key: admin/global, msg: RestOp method PATCH path /api/vrfcontext/vrfcontext-d19e04cc-a0bf-45e6-9557-f579ff6494f5 tenant admin Obj {"static_routes":[{"next_hop":{"addr":"192.168.13.11","type":"V4"}...},....  returned err {"code":0,"message":"map[error:Overlapping routes found]","Verb":"PATCH","Url":"https://####//api/vrfcontext/vrfcontext-d19e04cc-a0bf-45e6-9557-f579ff6494f5","HttpStatusCode":400} with response null
  • ALB controller UI shows "Failure: Overlapping routes found"

Environment

  • All TKG version
  • Using NSX-ALB as ClusterIP mode (Dedicated AVI-SE per k8s cluster)

Cause

AKO can't register a new route against the ALB controller because old routing entries already exist with the wrong label.  As a result, AVI-SE can't install the correct routing information from the ALB controller.

  • ALB controller UI doesn't show label information, so the user can't confirm whether it's a valid routing entry or not
    • You can only see the Static Route entries: (ALB controller UI --> Infrastructure --> Cloud Resources --> VRF Context --> global --> Edit --> Static Route)
    • "Additional Information" section in this KB explains how to check the routing entry with a label via CLI, not ALB controller UI

Resolution

  1. Clean up the old routing entries based on k8s node IP address
    1. AVI-UI --> Infrastructure --> Cloud Resources --> VRF Context --> Select target VRF (default:global) --> Edit --> Static Route
    2. See "Next Hop" which corresponded with k8s node IP address
    3. Click the Trash icon against the target routing entry
    4. Click "SAVE"
  2. Restart the AKO pod, after that AKO will recreate fresh routing entries

    kubectl -n avi-system delete pod ako-0
    kubectl -n avi-system get pods # Status: running
    kubectl -n avi-system logs ako-0 # Check no ERROR
  3. Check VIP access
  • VirtualService health-check will be all green
  • Expected routing entries are installed into AVI-SE
  • VIP access works well

Additional Information

Troubleshooting Steps

Check the target node IP address

kubectl get nodes -owide
#> NAME                                            STATUS    INTERNAL-IP    
#> workload-cluster-A-75hv8-4qpvk                  Ready     192.168.13.11  
#> workload-cluster-A-md-0-92b72-7c67d684c-57rhn   Ready     192.168.13.47  
#> workload-cluster-A-md-0-92b72-7c67d684c-jtrms   Ready     192.168.13.51  
#> workload-cluster-A-md-0-92b72-7c67d684c-nmzmp   Ready     192.168.13.56  

 

Check the current AKO configuration

kubectl -n avi-system get cm avi-k8s-config -oyaml | grep -E 'cloudName:|clusterName:|cniPlugin:|controllerVersion:|serviceEngineGroupName:|serviceType:'
#> cloudName: Default-Cloud
#> clusterName: workload-cluster-A
#> cniPlugin: antrea
#> controllerVersion: 22.1.2
#> serviceEngineGroupName: test-seg
#> serviceType: ClusterIP

 

Check the routing table in the AVI-SE

ssh admin@$AVI_SE_IPADDR

sudo ip netns list
#> avi_ns1 (id: 0)
#> avi_poll_ns1
#> avi_poll_ns2

sudo ip netns exec avi_ns1 ip route
#> ###.##.0.0/24 via 192.168.13.11 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> ###.##.1.0/24 via 192.168.13.51 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> ###.##.2.0/24 via 192.168.13.47 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)
#> ###.##.3.0/24 via 192.168.13.56 dev avi_eth8 metric 30000 (<--- Required but Not exist during the trouble)

 

Check the routing information in the ALB-controller

ssh admin@$ALB_CONTROLLER_IPADDR

admin@avi-controller:~$ shell
Login: admin
Password:

[admin:avi-controller]: > show vrfcontext global
Multiple objects found for this query.
        [0]: vrfcontext-f790b561-e7f2-49ee-b08c-425aac54e286#global in tenant admin, Cloud Default-Cloud
        [1]: vrfcontext-69cacb9b-c92f-4432-b038-57a4ce1196d2#global in tenant admin, Cloud TEST-CLOUD
Select one: 0 # <------ Choose the target Cloud number
+------------------+-------------------------------------------------+
| Field            | Value                                           |
+------------------+-------------------------------------------------+
| uuid             | vrfcontext-f790b561-e7f2-49ee-b08c-425aac54e286 |
| name             | global                                          |
| static_routes[1] |                                                 |
|   prefix         | ###.##.0.0/24                                   |
|   next_hop       | 192.168.13.11                                   |
|   route_id       | workload-cluster-A-1                            |
|   labels[1]      |                                                 |
|     key          | clustername                                     |
|     value        | workload-cluster-A                              | # CHECK: Target cluster name corresponds with "clusterName"?
| static_routes[2] |                                                 |
|   prefix         | ###.##.2.0/24                                   |
|   next_hop       | 192.168.13.47                                   | # CHECK: Is there any duplicated next_hop?
|   route_id       | workload-cluster-A-2                            |
|   labels[1]      |                                                 |
|     key          | clustername                                     |
|     value        | workload-cluster-A                              |
| static_routes[3] |                                                 |
|   prefix         | ###.##.1.0/24                                   |
|   next_hop       | 192.168.13.51                                   |
|   route_id       | workload-cluster-A-3                            |
|   labels[1]      |                                                 |
|     key          | clustername                                     |
|     value        | workload-cluster-A                              |
| static_routes[4] |                                                 |
|   prefix         | ###.##.3.0/24                                   |
|   next_hop       | 192.168.13.56                                   |
|   route_id       | workload-cluster-A-4                            |
|   labels[1]      |                                                 |
|     key          | clustername                                     |
|     value        | workload-cluster-A                              |
| system_default   | True                                            |
| lldp_enable      | True                                            |
| tenant_ref       | admin                                           |
| cloud_ref        | Default-Cloud                                   |
+------------------+-------------------------------------------------+