Isolated Router VMs Delete or Unexpectedly Create During Upgrade - Resource/Isolation Segment
search cancel

Isolated Router VMs Delete or Unexpectedly Create During Upgrade - Resource/Isolation Segment

book

Article ID: 442503

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

When upgrading to Resource/Isolation Segment 10.4.0 or 10.4.1 from 10.3.x, two distinct failure modes occur regarding the isolated_router job:

  • Failure Mode 1 (Routers Deleted): These failures occur when generating the10.4.0 or 10.4.1 configuration using om config-template and applying the configuration via om configure-product. Isolated routers are deleted during an upgrade to Resource/Isolation Segment 10.4.0 or 10.4.1. Reachability to the public Isolation Segment is lost. Cloud platform backends for the load balancer appear empty. Inspection of the update logs reveals the isolated_router instance group being scaled from its original count to zero. E.g
    instance_groups:
    - name: isolated_router
    - instances: 3
    + instances: 0
  • Failure Mode 2 (Unexpected Routers Created): Three unexpected router VMs create upon running Apply Changes. These failures occur when intentionally scaling isolated_router to 0 in the 10.3.x environment and staging 10.4.0 or 10.4.1 normally without using config-template.

 

Environment

  • Resource Segment Tile 10.4.0 and 10.4.1
  • om config-template

Cause

The 10.4.x release introduces a new boolean tile property, routing_enable_gorouter_vms ("Enable routing VMs"), which controls whether the isolated_router job deploys.

When set to false, a configuration constraint forces the isolated_router instance count to 0, deleting the VMs. When set to true, the constraint lifts and the instance count defaults to 3. A migration script handling the 10.3.x to 10.4.x upgrade path unconditionally sets this new property to true without inspecting the prior isolated_router instance count.

  • Cause for Failure Mode 1 (Routers Deleted): The routing_enable_gorouter_vms property lacks a default value in the tile configuration structure. Therefore, the om config-template command outputs the property as absent or false. Applying that configuration via om configure-product overrides the migration script's true value back to false. The constraint then forces isolated_router instances to 0, and Apply Changes deletes all router VMs.
  • Cause for Failure Mode 2 (Unexpected Routers Created): The migration script unconditionally sets routing_enable_gorouter_vms to true regardless of the prior instance count. With the property set to true, the constraint no longer applies, causing the instance count to reset to the tile default of 3.

Resolution

This will be addressed in upcoming patch release. Fix would contain a corrected migration script that conditions the property value on the prior instance count and adds a default true value to the property configuration.

To apply immediate workarounds on10.4.0 or 10.4.1 before upgrading:

Workaround for Failure Mode 1 (Routers Deleted):

  1. Navigate to the Resource Segment tile in Ops Manager.

  2. Locate the Enable routing VMs property within the networking configuration.

  3. Ensure the checkbox is enabled (set to true).

  4. Verify the Resource Config for the Isolated Router reflects the correct number of instances (e.g., 3).

  5. Apply Changes.

Workaround for Failure Mode 2 (Unexpected Routers Created):

  1. Stage the 10.4.0 or 10.4.1 release.

  2. Before running Apply Changes, verify the Enable routing VMs property within the networking configuration.

  3. Ensure the checkbox is disabled (set to false).

  4. Verify the Resource Config for the Isolated Router and check that the number of instances is set to 0.

  5. Run Apply Changes.