vCenter workloads may fail or the customer may fail to log in to vCenter.
Services are up and running, no core dumps created, resolvable via reboot, but missing the reason why.
An error occurred while fetching identity providers. Please try again later. If problem persists, contact your administrator<date && time> INFO websso[71:tomcat-http--33] [CorId=487fd2f5-e5c1-4592-b292-12345677890] [com.vmware.identity.samlservice.impl.ExternalIdpProvider] Got exception (sleeping before retry) com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloadedEnvoy-sidecar is limited to use up to 1GB of memory. This can be seen in .etc/vmware-envoy-sidecar/config.yaml# cat /etc/vmware-envoy-sidecar/config.yaml | grep -C2 1073741824 typed_config: "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig max_heap_size_bytes: 1073741824 # 1GB actions: - name: "envoy.overload_actions.disable_http_keepalive"
When it reaches 98% of this memory, it starts sending overload responses, which may cause failures in the vCenter internal workloads.
You can Identify the problem using the following commands:
zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l
If the result is different than 0, then execute:
On vCenter 8.0U3 and VCF 5.x:
zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'
On vCenter 9.0, VCF 9.x:
zgrep envoy_overload_envoy_resource_monitors_fixed_heap_pressure /var/log/vmware/vstats/metrics/ENVOY_SIDECAR* | grep -v "# TYPE" | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 98}'
If the above command returns 1, then you hit the envoy-sidecar memory limit.
Resolved in vCenter Server 9.0.1.0. for vCenter 8.0, fix will be available in a future release.
Workaround:
# cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back# sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml
# service-control --restart envoy-sidecar# sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml# service-control --restart envoy-sidecar
In some corner cases even 4 GBs will not be enough. We recommend to completely remove these two actions:
- name: "envoy.overload_actions.stop_accepting_requests" triggers: - name: "envoy.resource_monitors.global_downstream_max_connections" threshold: value: 0.99 - name: "envoy.resource_monitors.fixed_heap" threshold: value: 0.98
- name: "envoy.overload_actions.reject_incoming_connections" triggers: - name: "envoy.resource_monitors.fixed_heap" threshold: value: 1.00
Using vim:# vim /etc/vmware-envoy-sidecar/config.yaml
After the two actions are deleted, the entire section for overload manager in the yaml file should look like this:
overload_manager: refresh_interval: 1s resource_monitors: - name: "envoy.resource_monitors.global_downstream_max_connections" typed_config: "@type": type.googleapis.com/envoy.extensions.resource_monitors.downstream_connections.v3.DownstreamConnectionsConfig max_active_downstream_connections: 8000 - name: "envoy.resource_monitors.fixed_heap" typed_config: "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig max_heap_size_bytes: 4294967296 # 4GB actions: - name: "envoy.overload_actions.shrink_heap" triggers: - name: "envoy.resource_monitors.fixed_heap" threshold: value: 0.75 - name: "envoy.overload_actions.disable_http_keepalive" triggers: - name: "envoy.resource_monitors.global_downstream_max_connections" threshold: value: 0.8 - name: "envoy.resource_monitors.fixed_heap" threshold: value: 0.95 - name: "envoy.overload_actions.reduce_timeouts" triggers: - name: "envoy.resource_monitors.global_downstream_max_connections" scaled: scaling_threshold: 0.25 saturation_threshold: 0.97 - name: "envoy.resource_monitors.fixed_heap" scaled: scaling_threshold: 0.85 saturation_threshold: 0.97 typed_config: "@type": type.googleapis.com/envoy.config.overload.v3.ScaleTimersOverloadActionConfig timer_scale_factors: - timer: HTTP_DOWNSTREAM_CONNECTION_IDLE min_timeout: 2s
Save the file and restart sidecar service:# service-control --restart envoy-sidecar