Workload Management tab fails to load.
search cancel

Workload Management tab fails to load.

book

Article ID: 409338

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • All services on vCenter Server are running, and no core dumps created.
  • On vCenter server:

    /var/log/vmware/sso/websso.log:

    YYYY-MM-DDTHH:MM:SS INFO websso[73:tomcat-http--27] [CorId=########-####-####-####-#########] [com.vmware.identity.samlservice.impl.ExternalIdpProvider] Got exception (sleeping before retry)

    com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded
            at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpUtil.validateHttpResponse(ApacheHttpUtil.java:101) ~[vapi-runtime-2.100.0.jar:?]

    /var/log/vmware/vapi/endpoint/endpoint.log:

    YYYY-MM-DDTHH:MM:SS| WARN  | vAPI-I/O dispatcher-0   | ApiMethodSession  | Error was thrown while running close session: com.vmware.vapi.endpoint.vapi.ApiMethodSession$######

    com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded

  • Execute the following commands to determine if it's the same problem:

    • zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l
    • If the result is different than 0, then execute:

On vCenter 8.0U3:

zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'

On vCenter 9.0

zgrep envoy_overload_envoy_resource_monitors_fixed_heap_pressure /var/log/vmware/vstats/metrics/ENVOY_SIDECAR* | grep -v "# TYPE" | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 98}'

    • If the command outputs 1, it means the envoy-sidecar has hit its memory threshold.

 

Environment

vCenter Server 8.x
vCenter Server 9.x

Cause

Memory exhaustion in the envoy-sidecar causes vCenter internal workloads to fail, triggering 503 service errors.

Resolution

Workaround:

  1. Note: Ensure there is valid backup/offline snapshot of the VCSA prior to implementing the workaround. Refer VMware vCenter in Enhanced Linked Mode pre-changes snapshot (online or offline) best practice

  2. Log in to the vCenter via SSH.

  3. Create a backup of the envoy sidecar config file:

    # cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back

  4. Using sed update the Envoy memory limit from 1073741824 (1 GB) to 2147483648 (2 GB):

    # sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml

  5.  Restart envoy-sidecar:

    # service-control --restart envoy-sidecar

  6. In certain scenarios, a memory allocation of 2 GB Might be insufficient and it is advised to increase the memory allocation from 2 GB to 4 GB.

    # sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml
    # service-control --restart envoy-sidecar

  7. In rare scenarios 4 GB of memory might still lead to memory exhaustion. In such cases we recommend to completely remove these two actions:

    1. vi /etc/vmware-envoy-sidecar/config.yaml 

      - name: "envoy.overload_actions.stop_accepting_requests"

            triggers:
              - name: "envoy.resource_monitors.global_downstream_max_connections"
                threshold:
                  value: 0.99
              - name: "envoy.resource_monitors.fixed_heap"
                threshold:
                  value: 0.98
       
        - name: "envoy.overload_actions.reject_incoming_connections"

            triggers:
              - name: "envoy.resource_monitors.fixed_heap"
                threshold:
                  value: 1.00

    2. Post the above two actions are removed , the updated overload Manager section in the YAML file should appear as follows,

      overload_manager:
        refresh_interval: 1s
        resource_monitors:
          - name: "envoy.resource_monitors.global_downstream_max_connections"
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.resource_monitors.downstream_connections.v3.DownstreamConnectionsConfig
              max_active_downstream_connections: 8000
          - name: "envoy.resource_monitors.fixed_heap"
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
              max_heap_size_bytes: 4294967296 # 4GB
        actions:
          - name: "envoy.overload_actions.shrink_heap"
            triggers:
              - name: "envoy.resource_monitors.fixed_heap"
                threshold:
                  value: 0.75
          - name: "envoy.overload_actions.disable_http_keepalive"
            triggers:
              - name: "envoy.resource_monitors.global_downstream_max_connections"
                threshold:
                  value: 0.8
              - name: "envoy.resource_monitors.fixed_heap"
                threshold:
                  value: 0.95
          - name: "envoy.overload_actions.reduce_timeouts"
            triggers:
              - name: "envoy.resource_monitors.global_downstream_max_connections"
                scaled:
                  scaling_threshold: 0.25
                  saturation_threshold: 0.97
              - name: "envoy.resource_monitors.fixed_heap"
                scaled:
                  scaling_threshold: 0.85
                  saturation_threshold: 0.97
            typed_config:
              "@type": type.googleapis.com/envoy.config.overload.v3.ScaleTimersOverloadActionConfig
              timer_scale_factors:
                - timer: HTTP_DOWNSTREAM_CONNECTION_IDLE
                  min_timeout: 2s

  8.  Save the file and restart sidecar service:

    # service-control --restart envoy-sidecar