TAS and Isolation Segment with routing-release 0.259.0 fail to prune stale routes on gorouter.
search cancel

TAS and Isolation Segment with routing-release 0.259.0 fail to prune stale routes on gorouter.

book

Article ID: 298414

calendar_today

Updated On: 01-14-2025

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

The issue is described in TAS and Isolation Segment release notes:

Note: This version of TAS for VMs contains a known issue with Gorouter error handling for backend app requests. Failures that previously returned HTTP Status Codes 496, 499, 503, 525, or 526 may instead return 502. Additionally, stale routes may fail to be pruned properly, which could result in apps unexpectedly returning HTTP Status Code 502.

The impact to production systems is most likely an increase in 502s in the logs. Most of the 502s will be a result of the other error codes now being presented as 502s, which is annoying but not a disaster. There is an edge case where the count of 'bad' routes increases due to the lack of pruning . If there are many unpruned routes, 502s presented to end users is possible. The go-router will retry backends based on configuration settings, and once it's out of retries the 502 would be presented to the user. This is bad. The mitigation for unpruned routes is to do a rolling restart of go-routers.

This is caused by a Golang migration from v1.19 to 1.20, as the result Gorouter is now only returning HTTP 502s when backend apps have certificate problems. Previously it would return different messages, and set the X-CF-Router-Error header with useful information regarding what type of certificate problem was encountered, and clean up stale route accordingly. 

Impacted TAS and Isolation Segment releases as below: 

  • TAS v3.0.8
  • TAS v2.13.18
  • TAS v2.12.25 (End of General Support)
  • TAS v2.11.36
  • Isolation segment 3.0.8
  • Isolation segment 2.13.15
  • Isolation segment 2.12.25 (End of General Support)
  • Isolation segment 2.11.30


Environment

Product Version: 2.13

Resolution

Engineering has released routing-release v0.266.0*** to address the issue, but it will take time for next TAS and Isolation Segment minor version to include the fixed routing-release. 

Workaround until fix release available  
1. SSH into the Ops Manager VM. For more information, refer to Logging Into Ops Manager VMs with SSH.

2. Download the patched routing 0.266.0 releases to the Ops Manager VM: 

sudo -u tempest-web wget -P /var/tempest/releases/ https://github.com/cloudfoundry/routing-release/releases/download/v0.266.0/routing-0.266.0.tgz

3. Find the file paths of the YAML files that define all the versions of the TAS tile in your library; you want the .yml file from the following command It should look something like:

  • "/var/tempest/workspaces/default/metadata/c5c28c298f9f.yml
  • "/var/tempest/workspaces/default/metadata/product-template-c5c28c298f9f.yml"
Run this command:
(for TAS)
sudo grep -l "^name: cf" /var/tempest/workspaces/default/metadata/*
(for Isolation Segment)
sudo grep -l "^name: p-isolation-segment" /var/tempest/workspaces/default/metadata/*

4. Confirm the tile version you’re using with the following command on each full file path; if there’s more than one file returned by the above, run it on each to identify the version that you have currently deployed, which you’ll need to edit in next steps.

sudo head FULL-FILE-PATH 

5. Make a backup of this YAML file, into your home directory. You can restore this backup over the file you’re about to edit in order to revert the workaround if needed later. 

sudo cp FULL-FILE-PATH ~ubuntu/ 

6. Edit the YAML file (using “sudo editor-of-choice”, such as “emacs”, “vi”, or “nano”) , make the following changes for routing release:

  • Update the "version" and "file" keys with new validate indicated below.
  • Remove the entire "exported_from" block.

before change

- name: routing
  version: 0.259.0
  file: routing-0.259.0-ubuntu-xenial-621.448.tgz
  exported_from:
  - os: ubuntu-xenial
    version: '621.448'

after change

- name: routing
  version: 0.266.0
  file: routing-0.266.0.tgz

7. Apply Changes to the modified tile.  

Note: route_registrar job depends on routing-release , thus the deployment will trigger update of all instances with route_registrar job. Diego cells won’t be updated.