In AKO environment, pools are down due to health check failing with "Server Unreachable" error.

Products

VMware Avi Load Balancer

Issue/Introduction

In an environment where AKO (Avi Kubernetes Operator) is deployed, pool servers are marked down with the reason "Server unreachable".

VS has a fault : Server: <server> not accessible from Service Engine(s)

Cause

Routes defined within a VRF are applied only to Service Engines that belong to a Service Engine Group (SEG) with matching labels. When AKO is deployed with serviceType set to ClusterIP, it automatically assigns labels to the associated Service Engine Group. If the default gateway or any required route lacks the same label as the SEG, the route will not be propagated to the Service Engines.

Resolution

To identify whether SE has the required route, verify the output for the command below from the Avi controller CLI:

> show serviceengine <se> route

If the above output does not have the required route present, validate your configuration for the related vrfcontext and confirm the route is present there.
This can be verified in the UI under Infrastructure > Cloud Resources > VRF Context. Sample screenshot below :

If the required route is present in the vrfcontext, this may indicate a labeling issue. To verify the same, check the labels applied to the Service Engine Group and the routes present in the VRF context using the set of commands mentioned below:

show serviceenginegroup <service engine group name>

Check the labels in the output of the above command. Reference output below :

| labels [1] |
| key | clustername
| value | test-cluster

Confirm the required route under the vrfcontext command has the same labels applied. Below is a sample output for a route with no labels

[admin:controller]: > show vrfcontext global

+------------------+-------------------------------------------------+
| Field | Value |
+------------------+-------------------------------------------------+
| uuid | vrfcontext-6e5693f2-f28b-40b3-8dc0-7e1057e0e555 |
| name | global |
| static_routes[1] | |
| prefix | 0.0.0.0/0 |
| next_hop | 10.#.#.# |
| route_id | 1 |
| system_default | True |
| lldp_enable | True |
| tenant_ref | admin |
| cloud_ref | vCenter |
+------------------+-------------------------------------------------+

To fix the configuration issue, apply the labels for the required route from the above output. Be careful to select the index number of the route. Below is the set of commands to add the labels to the route.

[admin:controller]: > configure vrfcontext <vrfcontext-name>
[admin:controller]: vrfcontext> static_routes index 1
[admin:controller]: vrfcontext:static_routes:labels> labels key clustername value test-cluster [admin:controller]: vrfcontext:static_routes:labels> save [admin:controller]: vrfcontext:static_routes> save [admin:controller]: vrfcontext>save[admin:controller]: vrfcontext> save

The final output of the above steps will reflect the required key-value pair in the labels as shown below :

+------------------+-------------------------------------------------+

| Field | Value |

+------------------+-------------------------------------------------+

| uuid | vrfcontext-3e558787-1c45-4307-9c4c-7dc54d0352ef |

| name | global |

| static_routes[1] | |

| prefix | 0.0.0.0/0 |

| next_hop |10.#.#.# |

| route_id | 1 |

| labels[1] | |

| key | clustername |

| value | test-cluster |

| system_default | True |

| lldp_enable | True |

| tenant_ref | admin |

| cloud_ref | vcentercloud |

+------------------+-------------------------------------------------+

Post the above steps, the required route is pushed to the SE. You can validate the same using command :
>show serviceengine <se-name> route

Note: You can add multiple labels to the route, if needed.

Additional Information

When AKO is deployed, there is a values.yaml file which contains a setting for "serviceType".
This can be configured as NodePort, ClusterIP, or NodePortLocal. (These are settings are case sensitive)
When the serviceType is set to ClusterIP and disableroutesync is set to False, labels are required and will be added to the SE group.
When the serviceType is set to ClusterIP and disableroutesync is set to True, labels will not be added to the SE group.
Labels will not be added when using serviceType as NodePort or NodePortLocal

When deploying a TKG management cluster, there is an optional section where you can specify labels.
https://docs.vmware.com/en/VMware-Tanzu-for-Kubernetes-Operations/2.3/tko-reference-architecture/GUID-deployment-guides-tko-on-vsphere.html#deploy-tanzu-kubernetes-grid-tkg-management-cluster-19

"Cluster Labels: Optional. Leave the cluster labels section empty to apply the above workload cluster network settings by default. If you specify any label here, you must specify the same values in the configuration YAML file of the workload cluster. Else, the system places the endpoint VIP of your workload cluster in Management Cluster Data Plane VIP Network by default."

To determine if labels are added on AKO deployment, it is required to check the values.yaml file of the AKO deployment to see what serviceType is set, or when deploying a TKG Management cluster, in the UI configuration prompts.