Gorouter least-connection mode favors specific app instances
search cancel

Gorouter least-connection mode favors specific app instances

book

Article ID: 298415

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Gorouter has two possible balancing algorithms that it can use for balancing requests to backend app instances:
  • Round Robin - requests for a given route are spread evenly among the backend instances one after another.
  • Least Connection - a request for a given route is forwarded to the backend instance with least amount of current connections as tracked by gorouter. If there are multiple backend instances with equal least connection counts then a random instance will be chosen out of the pool of instances with least connections. 
Gorouter defaults to Round Robin. More information about both of these can be found in the docs.

This KB details a recent discovery where gorouter configured with Least Connection mode can favor specific app instances. 

There are two parts to the Least Connection algorithm.
  1. Pick the endpoint with least amount of connections.
  2. If there are multiple instances that have least amount of connections, then pick a random one.
Per every request, the algorithm generates a random array of numbers based on the total count of backend instances for a given route. For example, if there are 5 instances of appA then the random array may look like the following [0,2,4,1,3] for one request and [3,2,4,1,0] the next request. Once the algorithm has the random array, it will iterate over each endpoint by order of the random array to fetch connection stats per backend instance. As it does this, it compares the number of connections per instance to determine the instance with least amount of connections. This design is aimed to always give us the endpoint with least amount of connections, and if there are several instances that have the same least amount of connections then a random one will be chosen. 

It has been recently discovered that the random object's vectors can be cleared to 0. When this happens it does not generate random values anymore but instead the same values over and over. This behavior has also been observed outside of cloud foundry, for example this github issue is relevant. If the random object generates the same values every time then it is possible for the same endpoint to win the algorithm per every request given all instances have equal connection counts. It is worth noting that the Least Connection logic of the algorithm is still valid and works even when the random object is in this state. However the random part of the algorithm is impacted as the same ordered list will be generated per every request.

Environment

Product Version: 2.13

Resolution

The product team has created a patch based on available data and official documentation recommendations. The patch is anticipated to be part of routing-release v0.252.0. The patch is now included since the following TAS versions:
Screenshot 2023-01-19 at 10.14.13 AM.png

If upgrading to a patched TAS version is not possible yet, here are 2 additional workarounds available to choose from:
  1. Switch to using the round robin balancing algorithm. If this is not a preferred option, then proceed to workaround 2.
  2. Bosh restart the gorouters that are in this state. A bosh restart is preferred because it will call other lifecycle scripts such as the drain script ensuring graceful restart of gorouters while rebalancing in flight connections.