Smoke test fails on 1 or more isolation segments after upgrading to Tanzu Application Service for VMs 2.9+
search cancel

Smoke test fails on 1 or more isolation segments after upgrading to Tanzu Application Service for VMs 2.9+

book

Article ID: 298110

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Smoke test fails on 1 or more isolation segments after upgrading Tanzu Application Service for VMs from v2.8.14 to v2.9.0

Symptoms

  • Smoke test app route points to unexpected domain. In other words, the wrong domain is mapped to an isolation segment.
  • You see the error "Error: Response exceeded maximum value".
  • Your foundation is on Tanzu Application Service for VMs (TAS for VMs) v2.9 or higher.

For example, you may see the following error:

Task 1454614 | 18:35:54 | Preparing deployment: Preparing deployment (00:00:13)
Task 1454614 | 18:36:07 | Running errand: clock_global/b6d5a1d3-aa41-45f6-ae41-18a5a3d5d1f8 (0) (00:06:23)
         L Error: Response exceeded maximum allowed length
Running errand 'smoke_tests':
   Expected task '1454614' to succeed but state is 'error'
Exit code 1
Task 1454614 | 18:42:30 | Error: Response exceeded maximum allowed length


This does not pertain specifically to the domain problem, but it indicates that the error text exceeded 1MB.

Running the smoke test errand from the clock_global VM surfaced this error:

------------------------------
• Failure [370.129 seconds]
Application Workflow Linux Applications [It] can be pushed, scaled and deleted
/var/vcap/packages/smoke-tests/src/base_test.go:21

Timed out after 300.000s.
Unable to make an HTTP connection to the first app instance.
Expected :
  -1 to equal : 0

/var/vcap/packages/smoke-tests/src/base_test.go:50
------------------------------
SSSSSS


This error points to a problem with connecting to the smoke test app. You may see similar output to the following from the smoke test when "pushing an application".

```cf7 push SMOKES-APP-f72a9719-4f9d -b ruby_buildpack -p assets/ruby_simple --random-route```


As you can imagine, the app gets pushed AOK:

```OUT: name: SMOKES-APP-f72a9719-4f9d
OUT: requested state: started
OUT: isolation segment: appsec
OUT: routes: SMOKES-APP-f72a9719-4f9d-reliable-armadillo-jm.apps.sandbox.pcf.domain.net
OUT: last uploaded: Wed 12 Aug 20:41:04 UTC 2020
OUT: stack: cflinuxfs3
OUT: buildpacks: ruby
OUT:
OUT: type: web
OUT: sidecars:
OUT: instances: 1/1
OUT: memory usage: 1024M
OUT: start command: bundle exec rackup config.ru -p $PORT
OUT: state since cpu memory disk details
OUT: #0 running 2020-08-12T20:41:15Z 0.0% 42.7K of 1G 24M of 1G ```


It deployed the app with the route `SMOKES-APP-f72a9719-4f9d-reliable-armadillo-jm.apps.sandbox.pcf.domain.net`. However, this is not the domain that was configured and it should be `appsec-apps.sandbox.pcf.domain.net`.\

If the smoke test errors are ambiguous, run the errand from the clock_global VM:

  • bosh -d ssh clock_global/0 (or clock_global/)
  • sudo su -
  • /var/vcap/jobs/smoke_tests/bin/run > /tmp/output.txt (redirecting output to a file)

This allows you to detect errors that point to the root cause, such as an incorrect route.

If access logging is not enabled on the routers for the isolation segment, enable it and Apply Changes. This allows you to see whether the smoke test app calls hit those routers.


Environment

Product Version: 2.9

Resolution

If you see that the smoke test for the TAS for VMs tile creates a test app in the default TAS app domain, but the route uses a domain that points to the isolation segment, then you are hitting a known issue.

This issue is caused by a deprecated property. This property sets the app domain for the smoke tests  (.properties.smoke_tests.specified.apps_domain). This property was present in TAS for VMs versions 2.8.x and earlier but it has been removed. Now the smoke-test defaults to the first domain in the output of cf domains. This causes the app landing on TAS for VMs Diego cell while the domain was mapped to the Isolation Segment Load Balancer / Gorouter, which does not have a routing table entry for that AI route.

This issue is fixed in tasks 2.9.21, 2.10.13, and 2.11.1. The fix allows the operator to provide the apps_domain property when deploying TAS for VMs and also properly configures user provided space when deploying an isolation segment.