Experiencing slowness in loading the workload overview page in the TAP Developer Portal

Products

VMware Tanzu Application Platform

Issue/Introduction

Users may experience slowness in loading the workload overview page in the TAP Developer Portal and this issue has a few possible reasons.

This article aims to introduce one of the possible reasons - slowness is caused by timing out to run cluster(s).

Cause

When we start investigating performance issue related to the TAP Developer Portal, we usually collect the following artifacts first.

A HAR file captured during the issue time.
- Refer to Generate HAR file to retrieve a har file.
TAP GUI backstage server log. Run the following command:
- $ kubectl logs deployment/server -n tap-gui --all-containers
Envoy proxy pod log. Run the following command:
- $ kubectl logs -n tanzu-system-ingress deployments/envoy --all-containers

In a scenario where the TAP Developer Portal loading slowness is caused by a particular RUN cluster, it would be possible to observe 504 gateway timeout error against that cluster.

$ cat ../envoy-proxy-pod-log.txt| grep "/api/kubernetes/proxy/apis/carto.run/v1alpha1/deliverables" | grep 504

"GET /api/kubernetes/proxy/apis/carto.run/v1alpha1/deliverables HTTP/2" 504 UT 0 24 14999 - ...

Here is a breakdown for the above log message:

504 = Gateway Timeout
UT = Upstream Timeout (Envoy flag)
14999 ms = Request duration hit 15s timeout limit

Then when checking the HAR file, it would be possible to see the same 504 error similar to the following.

{
  "_connectionId": "1713",
  "_initiator": {
    "type": "script",
    "stack": {
      "callFrames": [
        {
          "functionName": "proxy",
          "scriptId": "82",
          "url": "https://TAP-GUI-FQDN/static/module-backstage.cf5c2313.js",
          "lineNumber": 62,
          "columnNumber": 37808
        }
      ],
      "parent": {
        "description": "await",
        },
        "_priority": "High",
        "_resourceType": "fetch",
        "cache": {},
        "connection": "443",
        "pageref": "page_1",
        "request": {
          "method": "GET",
          "url": "https://TAP-GUI-FQDN/api/kubernetes/proxy/apis/carto.run/v1alpha1/workloads",
          "httpVersion": "http/2.0",
          "headers": [
            {
              "name": ":authority",
              "value": "TAP-GUI-FQDN"
            },
            {
              "name": ":method",
              "value": "GET"
            },
            {
              "name": ":path",
              "value": "/api/kubernetes/proxy/apis/carto.run/v1alpha1/workloads"
            },
...
            {
              "name": "backstage-kubernetes-cluster",
              "value": "PROBLEMATIC-RUN-CLUSTER"
            },
            {
              "name": "priority",
              "value": "u=1, i"
            },
            "response": {
              "status": 504,
              "statusText": "",
              "httpVersion": "http/2.0",
              "headers": [
                {
                  "name": "content-length",
                  "value": "24"
                },
                {
                  "name": "content-type",
                  "value": "text/plain"
                },
                {
                  "name": "date",
                  "value": "Wed, 29 Jan 2026 05:53:53 GMT"
                },
                {
                  "name": "server",
                  "value": "envoy"
                }
              ],
              "cookies": [],
              "content": {
                "size": 24,
                "mimeType": "text/plain",
                "text": "upstream request timeout"
...

PROBLEMATIC-RUN-CLUSTER is the RUN cluster name which is the possible root cause of the issue.

Resolution

Easiest to debug: try temporarily removing the PROBLEMATIC-RUN-CLUSTER from the view cluster TAP values and see if the load times improve. If it does, then next step is to isolate the debugging to that run cluster.
Increase Envoy timeout (temporary mitigation) : If slow responses are expected, increase the upstream timeout for TAP GUI routes from 15s to 60s in the TAP GUI HTTPProxy ingress configuration. This will need to be done using overlay as follows:

# Add the following secret to tap-install ns
---
apiVersion: v1
kind: Secret
metadata:
  name: tap-gui-timeout-overlay
  namespace: tap-install
stringData:
  tap-gui-timeout-overlay.yaml: |
    #@ load("@ytt:overlay", "overlay")
    
    #@overlay/match by=overlay.subset({"kind": "HTTPProxy", "metadata": {"name": "tap-gui"}}), expects="0+"
    ---
    spec:
      routes:
        #@overlay/match by=overlay.index(0)
        #@overlay/replace
        - services:
          - name: server
            port: 7000
          timeoutPolicy:
            response: "60s"
            idle: "120s"

# Add this to TAP Values
package_overlays:
- name: tap-gui
  secrets:
  - name: tap-gui-timeout-overlay

Check cluster connectivity: Verify the service account token and kubeconfig for PROBLEMATIC-RUN-CLUSTER in the TAP GUI configuration are valid and have not changed.