vCenter workloads may fail or the customer may fail to log in to vCenter.
Services are up and running, no core dumps created, resolvable via reboot, but missing the reason why.
Symptoms
-> services are running in VAMI [:5480]
-> some services report healthy with warnings, see examples below
- vAPI Endpoint service complains about SSO - Failed to retrieve SSO settings/ Failed to login in SSO/ Failed to retrieve VIM service URI from Lookup Service
- The License, vAPI Endpoint and VMware vSphere Profile-Driven Storage Services go into degraded state [healthy with warnings]
vSphere 8.X
VCF 5.X
Envoy-sidecar is limited to use up to 1GB, listable via configuration file # cat ./etc/vmware-envoy-sidecar/config.yaml | grep -C2 1073741824
typed_config:
"@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
max_heap_size_bytes: 1073741824 # 1GB
actions:
- name: "envoy.overload_actions.disable_http_keepalive"
When it reaches 98% of this memory, it starts sending overload responses, which may cause failures in the vCenter internal workloads.
Identifying the problem via commands
# zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l
If the result is different than 0, then execute:
# zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'
If the above command returns 1, then you hit the envoy-sidecar memory limit.
In websso.log we also find entries like <date && time> INFO websso[71:tomcat-http--33] [CorId=487fd2f5-e5c1-4592-b292-f987e3bda94e] [com.vmware.identity.samlservice.impl.ExternalIdpProvider] Got exception (sleeping before retry)
com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded
at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpUtil.validateHttpResponse(ApacheHttpUtil.java:101) ~[vapi-runtime-2.100.0.jar:?]
Issue is being addressed mitigated in vSphere 9 and future 8.X releases.
Recommendations are to Patch to latest Vcenter release, if the issue is repeatable even with said BUILD, then apply the workaround from below:
Workaround:
# cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back
# sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml
# service-control --restart envoy-sidecar
# sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml
# service-control --restart envoy-sidecar