Envoy-sidecar hits memory limit causing vSphere client unable to load and vAPI endpoint service degraded
search cancel

Envoy-sidecar hits memory limit causing vSphere client unable to load and vAPI endpoint service degraded

book

Article ID: 384498

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • vSphere client fails to load and shows either of the following errors as a banner message and does not loads the login page.

[400] An error occurred while processing the authentication response from the vCenter Single Sign-On server. Details: Status: urn:oasis:names:tc:SAML;2.O:status:Requester, sub status: null.

An error occurred while fetching identity providers. Please try again later. If problem persists, contact your administrator.

  • vCenter workloads may fail
  • All vCenter services can be found running, no core dumps generated. Issue can be temporarily resolved by a vCenter reboot.
  • Some services report healthy with warnings, see examples below
    • vAPI Endpoint
      • Failed to retrieve SSO settings
      • Failed to login in SSO
      • Failed to retrieve VIM service URI from Lookup Service
  • The License (vmware-cis-license), vAPI Endpoint (vmware-vapi-endpoint) and VMware vSphere Profile-Driven Storage (vmware-infraprofile) services go into degraded state [healthy with warnings]
  • vCenter websso log (/var/log/vmware/sso/websso.log) reports envoy overloaded messages

[YYYY-MM-DDTHH:MM] INFO websso[71:tomcat-http--33] [CorId=487fd2f5-####-####-####-12345677890] [com.vmware.identity.samlservice.impl.ExternalIdpProvider] Got exception (sleeping before retry) com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded

  • vCenter ssoAdminServer.log (/var/log/vmware/sso/ssoAdminServer.log)  reports envoy overloaded.

    [YYYY-MM-DDTHH:MM] ERROR ssoAdminServer [2338 : pool-2-thread-503] [OpId=2d227973-####-####-####-0b0741b3fa61] [com.vmware.vcenter.tokenservice.providers.VcIdentityInfoProviderImpl] Failed to get identity provider matching domain VMwareID com.vmware.vapi.client.exception. TransportProtocolException: HTTP response with status code 503 (enable debug logging for details) : envoy overloaded

  • To further validate the issue, the following command needs to be run from a SSH session to the vCenter. 

    zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l

    • If the result is different than 0, then execute:

      For vCenter 8.0U3:

      zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2|  sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'

      For vCenter 9.0:

      zgrep envoy_overload_envoy_resource_monitors_fixed_heap_pressure /var/log/vmware/vstats/metrics/ENVOY_SIDECAR* | grep -v "# TYPE" | cut -d ' ' -f2|  sort -n | uniq | tail -1 | awk '{print $1 >= 98}'

      If the above command returns 1, the envoy-sidecar memory limit has been reached

  • The currently configured memory heap size for envoy-sidecar service can be found in /etc/vmware-envoy-sidecar/config.yaml.

    # cat /etc/vmware-envoy-sidecar/config.yaml | grep -C2 1073741824
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
            max_heap_size_bytes: 1073741824 # 1GB
      actions:
        - name: "envoy.overload_actions.disable_http_keepalive"

Environment

  • vCenter 8.x
  • vCenter 9.x

Cause

Envoy-sidecar is limited to use up to 1GB of memory by default. When memory consumed by envoy-sidecar service reaches 98%, it starts sending overload responses, which may cause failures in the vCenter internal workloads. 

Resolution

Issue is resolved in the following releases:

For vCenter 8.x, issue is resolved in 8.0 U3h. Log in to the Broadcom Support Portal to download this patch.

For vCenter 9.x, issue is resolved in 9.0.1.0. Log in to the Broadcom Support Portal to download this patch, depending on the entitlement VMware vSphere Foundation or VMware Cloud Foundation.

Workaround:

  1. Take snapshots of the vCenter or ELM vCenter group (See vCenter in Enhanced Linked Mode pre-changes snapshot best practice for guidance on taking snapshots of vCenters in ELM)

  2. Log in to the vCenter via SSH.

  3. Create a backup of the envoy sidecar config file:
    # cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back

  4. Using sed update the Envoy memory limit from 1073741824 (1 GB) to 2147483648 (2 GB):
    # sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml

  5.  Restart envoy-sidecar:
    # service-control --restart envoy-sidecar

  6. Some cases have shown that Envoy memory limit of 2GB is not sufficient. In such cases, update the Envoy memory limit from 2 GB to 4 GB:
    # sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml
    # service-control --restart envoy-sidecar
  7. In some corner cases Envoy memory limit of 4 GB may not be enough. We recommend to completely remove the following two envoy-overload-actions from /etc/vmware-envoy-sidecar/config.yaml file.

   - name: "envoy.overload_actions.stop_accepting_requests"
      triggers:
        - name: "envoy.resource_monitors.global_downstream_max_connections"
          threshold:
            value: 0.99
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 0.98
 
  - name: "envoy.overload_actions.reject_incoming_connections"

      triggers:
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 1.00

a. Edit the envoy sidecar configuration file using vi editor to remove the two envoy-overload-actions (envoy.overload_actions.stop_accepting_requests and envoy.overload_actions.reject_incoming_connections):
    # vi /etc/vmware-envoy-sidecar/config.yaml

After the two envoy-overload-actions are removed, the entire section for overload_manager in the /etc/vmware-envoy-sidecar/config.yaml file should look like this:

overload_manager:
  refresh_interval: 1s
  resource_monitors:
    - name: "envoy.resource_monitors.global_downstream_max_connections"
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.resource_monitors.downstream_connections.v3.DownstreamConnectionsConfig
        max_active_downstream_connections: 8000
    - name: "envoy.resource_monitors.fixed_heap"
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
        max_heap_size_bytes: 4294967296 # 4GB
  actions:
    - name: "envoy.overload_actions.shrink_heap"
      triggers:
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 0.75
    - name: "envoy.overload_actions.disable_http_keepalive"
      triggers:
        - name: "envoy.resource_monitors.global_downstream_max_connections"
          threshold:
            value: 0.8
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 0.95
    - name: "envoy.overload_actions.reduce_timeouts"
      triggers:
        - name: "envoy.resource_monitors.global_downstream_max_connections"
          scaled:
            scaling_threshold: 0.25
            saturation_threshold: 0.97
        - name: "envoy.resource_monitors.fixed_heap"
          scaled:
            scaling_threshold: 0.85
            saturation_threshold: 0.97
      typed_config:
        "@type": type.googleapis.com/envoy.config.overload.v3.ScaleTimersOverloadActionConfig
        timer_scale_factors:
          - timer: HTTP_DOWNSTREAM_CONNECTION_IDLE
            min_timeout: 2s

b. Save the file and exit (press ESC, type :wq!, press Enter)

c. Restart envoy sidecar service:
     # service-control --restart envoy-sidecar