How to troubleshoot app access issues in VMware Tanzu Application Service (TAS) for VMs
search cancel

How to troubleshoot app access issues in VMware Tanzu Application Service (TAS) for VMs

book

Article ID: 298068

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

  • While accessing apps on the Tanzu Application Service (TAS) platform, the client (other app or browser) receives the HTTP status code 401, 403, 500, 502 etc.


The access from your client to the app in the container involves multiple components and multiple points where the failure could occur: client (app or browser) > Load Balancer (LB) > Gorouter > Diego Cell > Envoy Proxy > app in container


This article covers how to troubleshoot and narrow down the problem scope. 


Environment

Product Version: 2.8

Resolution

Narrow Down Scope 

To start, access the app from the nearest place (inside of container), then move outward from the container step by step, until you identify at which stage the problem is introduced.

For example, in this article, the problematic app is named “test” and deployed with route “test.mydomain.com”.


Test from the App Container

Access the app inside of the container. This step helps to break down the problem and identify whether it is an app issue or platform issue. The test cases in this article are with `curl` GET request at /PATH, please replace HTTP method / path, and add necessary header / data accordingly for your own app.

1. Use ssh to get into the app container:

$ cf ssh test 

2. Access the app port directly in the app container:
$ curl -v http://localhost:8080/PATH

3. Access the app via Envoy in the app container:
$ curl -v -k  https://localhost:61001/PATH


If both fail, then the application is having problems. If just step #3 fails, then there is an issue with Envoy.

If these commands all succeed, then the app is working and the problem is elsewhere. Continue troubleshooting. 


Test from the Diego Cell 

The next step outward is the Diego Cell. We'll continue by testing access to the app from the Diego Cell.

1. Fetch the Diego Cell IP:
$ cf curl /v2/apps/$(cf app test --guid)/stats
{
   "0": {
      "state": "RUNNING",
      "isolation_segment": null,
      "stats": {
         "name": "test",
         "uris": [
            "test.mydomain.com"
         ],
         "host": "10.197.2.25",
         "port": 61072,
         ...
         }
      }
   }
}

2. Fetch the app port. In this example, 61072/61078/61074 are HTTP/HTTPS/SSH external ports, 8080/61001/61002 are app/envoy/sshd ports on the container network.

Note: You will use the HTTP and HTTPS external ports in a future step.
$ cf ssh test -c 'echo $CF_INSTANCE_PORTS'
[{"external":61072,"internal":8080,"external_tls_proxy":61078,"internal_tls_proxy":61001},{"external":61074,"internal":2222,"external_tls_proxy":61080,"internal_tls_proxy":61002}]


In case `cf ssh` is being disabled on the platform, please locate the Diego Cell (with IP 10.197.2.25) and check its NAT rules. In below example, export port 61078 maps container app HTTP port 8080, and external port 61080 maps container envoy HTTPS port.

$ bosh -d DEPLOYMENT ssh diego_cell/<INDEX> -c "sudo iptables -L -t nat"
...
DNAT	tcp  --  anywhere	b649d48b-97f7-4ba4-8b25-d18d33d9fec9.diego-cell.infra.cf-f611e346a9cfc876c06f.bosh  tcp dpt:61072 to:10.255.128.38:8080
DNAT	tcp  --  anywhere	b649d48b-97f7-4ba4-8b25-d18d33d9fec9.diego-cell.infra.cf-f611e346a9cfc876c06f.bosh  tcp dpt:61074 to:10.255.128.38:2222
DNAT	tcp  --  anywhere	b649d48b-97f7-4ba4-8b25-d18d33d9fec9.diego-cell.infra.cf-f611e346a9cfc876c06f.bosh  tcp dpt:61078 to:10.255.128.38:61001
DNAT	tcp  --  anywhere	b649d48b-97f7-4ba4-8b25-d18d33d9fec9.diego-cell.infra.cf-f611e346a9cfc876c06f.bosh  tcp dpt:61080 to:10.255.128.38:61002
...


3. Then `bosh ssh` into the Diego Cell which hosts the container:

$ curl http://localhost:61072/PATH
$ curl -k -v https://localhost:61078/PATH

4. Run the following to access the Diego Cell from Gorouter VM. Use the IP and port found from step #1 and #2 above.
$ curl -v http://10.197.2.25:61072/PATH
$ curl -k -v https://10.197.2.25:61078/PATH


If these commands all succeed, then port forwarding on the Cell is working OK & the problem is elsewhere. Continue troubleshooting.

If the first command fails, there is an issue forwarding plain HTTP traffic to the app.
If the second fails, there is an issue forwarding HTTPS traffic onto Envoy.


Test against Gorouter

If the app can be accessed successfully with the above steps, then the problem is neither with app container and not with Diego Cell. The next step is to test against Gorouter endpoint with the Host attribute.

Use bosh ssh to access one of your Gorouter instance, it doesn't matter which one. Then run this command:

$ curl -k -v -H “Host: test.mydomain.com” https://GOROUTER_IP/PATH


If this command fails, then there is an issue with Gorouter or it's routing tables. See the section below on how to dump the Gorouter routing tables and how to review the Gorouter logs.


Check Load Balancer

If there are no problems with above steps, the problem could be with load balancer or something else. These are outside of the platform's scope and you need to talk with your Network or Load Balancer Admin to get assistance testing from these additional hops in the request path.


HTTP Status Code

502 Bad Gateway usually occurs in two scenarios

  • when the backend app is too slow to respond to requests. The Gorouter couldn’t get response until its backend timeout (configurable at network settings with default as 900s), and Gorouter returns 502 to the client.
  • Diego Instance Identity Root CA or Intermediate CA expired. In this case, 502 is returned immediately. Please find the CAs at ca_certs in /var/vcap/jobs/gorouter/config/gorouter.yml, and check if they expired. 

503 Service Unavailable one scenario is due to time gap between Gorouter and Diego cell. 

  • When either Gorouter or Diego cell fail to synchronize system clock. Gorouter will fail to complete TLS handshake with Diego container due to error in gorouter.stderr.log "http: proxy error: x509: certificate has expired or is not yet valid: current time YYYY-MM-DDTHH:MM:SSZ is before YYYY-MM-DDTHH:MM:SSZ". 

404 Not Found 

There are two types of "not found" responses:

1. Hostname doesn't exist in Gorouter routing table. In this case, Gorouter returns 404 with response body "404 Not Found: Requested route ('mydomain.com') does not exist."

2. Otherwise, the app receives the request but no handler for handling the requested path.

500 Internal Server Error 

  • The platform itself doesn't generate 500 response code, it's returned by apps, thus as next steps, it's recommended to review app status and logs. 

Tips with Gorouter

1. Fetch the routing table, on any Gorouter VM.

$ /var/vcap/jobs/gorouter/bin/retrieve-local-routes | jq . 
...
  "test.mydomain.com": [
    {
      "address": "10.197.2.25:61078",
      "tls": true,
      "ttl": 120,
      "tags": {
        "component": "route-emitter",
        "instance_id": "0",
        "process_id": "6b817f74-98d2-4602-a695-7fe5290ffb61",
        "process_instance_id": "b57df7c3-9b4c-4e01-7cd7-14b1",
        "process_type": "web",
        "source_id": "6b817f74-98d2-4602-a695-7fe5290ffb61"
      },
      "private_instance_id": "b57df7c3-9b4c-4e01-7cd7-14b1",
      "server_cert_domain_san": "b57df7c3-9b4c-4e01-7cd7-14b1"
    }
  ],
  ...

2. Test connection from Gorouter to container:

# copy ca_certs content from /var/vcap/jobs/gorouter/config/gorouter.yml, format it into /tmp/ca.pem

# this will fail because certificate subject name doesn't match, but it can tell if server certificate verification succeeds or not
$ curl -v https://10.197.2.25:61078 --cacert /tmp/ca.pem
...
*      server certificate verification OK
...
curl: (51) SSL: certificate subject name (1f20ff49-0d5e-4694-6d7c-4667) does not match target host name '10.213.60.22'

# test with -k to skip SSL validation, this should succeed
$ curl -v -k https://10.197.2.25:61078

 

3. Read the Goroute Logs:

2020/03/18 12:37:12 http: proxy error: x509: certificate is not valid for any names, but wanted to match a8a190e7-01c7-419d-4386-a1fb


The Gorouter connects to the backend container and talks to the envoy proxy over TLS. To avoid connecting to wrong containers, Gorouter verifies that the envoy certificate SAN matches the SAN in the routing table. If it doesn’t match, Gorouter will delete the route from its routing table because the backend container is for a different app. 

test.mydomain.com - [2020-03-31T23:41:35.633829784Z] "GET /xxxx?xxxxxx=5000 HTTP/1.1" 200 0 107 "-" "Typhoeus - https://github.com/typhoeus/typhoeus" "10.197.2.250:41198" "10.197.2.18:9024" x_forwarded_for:"10.197.2.25, 10.197.2.250" x_forwarded_proto:"https" vcap_request_id:"e9550647-11aa-4c79-592a-5fb12d9eddf5" response_time:0.010781 gorouter_time:0.000211 app_id:"-" app_index:"-" x_b3_traceid:"2e7cbc3649835a0c" x_b3_spanid:"2e7cbc3649835a0c" x_b3_parentspanid:"-" b3:"2e7cbc3649835a0c-2e7cbc3649835a0c"

The response_time in Gorouter access logs is the app response time, it's vital information to troubleshoot app slowness, if the slowness is only with certain apps, the problem is mostly caused by the app itself rather than the platform.

The gorouter_time is identical to round-trip time, app response_time, which is the time consumed on Gorouter itself. Please refer to this article - Troubleshooting Slow Requests for more details.