[400] An error occurred while processing the authentication response from the vCenter Single Sign-On server. Details: Status: urn:oasis:names:tc:SAML;2.O:status:Requester, sub status: null.
An error occurred while fetching identity providers. Please try again later. If problem persists, contact your administrator.
no healthy upstream
vmware-cis-license), vAPI Endpoint (vmware-vapi-endpoint) and VMware vSphere Profile-Driven Storage (vmware-infraprofile) services go into degraded state [healthy with warnings](/var/log/vmware/sso/websso.log) reports envoy overloaded messages
[YYYY-MM-DDTHH:MM] INFO websso[71:tomcat-http--33] [CorId=########-####-####-####-############] [com.vmware.identity.samlservice.impl.ExternalIdpProvider] Got exception (sleeping before retry) com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded
vCenter ssoAdminServer.log (/var/log/vmware/sso/ssoAdminServer.log) reports envoy overloaded.
[YYYY-MM-DDTHH:MM] ERROR ssoAdminServer [2338 : pool-2-thread-503] [OpId=########-####-####-####-############] [com.vmware.vcenter.tokenservice.providers.VcIdentityInfoProviderImpl] Failed to get identity provider matching domain VMwareID com.vmware.vapi.client.exception. TransportProtocolException: HTTP response with status code 503 (enable debug logging for details) : envoy overloaded
[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 308 317 gzip 4008 4007 0 <IP_address>:58010 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58482 HTTP/2 - 127.0.0.1:8085 - "Login"[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 209 331 gzip 1 1 0 <IP_address>:58010 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58482 HTTP/2 - 127.0.0.1:8085 - "Logout"[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 308 317 gzip 4009 4008 0 <IP_address>:51010 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58438 HTTP/2 - 127.0.0.1:8085 - "Login"[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 209 331 gzip 1 1 0 <IP_address>:51010 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58438 HTTP/2 - 127.0.0.1:8085 - "Logout"[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 308 317 gzip 4008 4008 0 <IP_address>:51994 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58482 HTTP/2 - 127.0.0.1:8085 - "Login"[YYYY-MM-DDTHH:MM] info envoy[3214] [Originator@6876 sub=Default] [YYYY-MM-DDTHH:MM] POST /sdk 500 via_upstream - 209 331 gzip 1 1 0 <IP_address>:51994 HTTP/1.1 TLSv1.2<IP_address>:443 127.0.0.1:58482 HTTP/2 - 127.0.0.1:8085 - "Logout"
zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l
If the result is different than 0, then execute:
For vCenter 8.0U3:
zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'
For vCenter 9.0:
zgrep envoy_overload_envoy_resource_monitors_fixed_heap_pressure /var/log/vmware/vstats/metrics/ENVOY_SIDECAR* | grep -v "# TYPE" | cut -d ' ' -f2| sort -n | uniq | tail -1 | awk '{print $1 >= 98}'
If the above command returns 1, the envoy-sidecar memory limit has been reached
/etc/vmware-envoy-sidecar/config.yaml# cat /etc/vmware-envoy-sidecar/config.yaml | grep -C2 1073741824typed_config:"@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfigmax_heap_size_bytes: 1073741824 # 1GBactions:- name: "envoy.overload_actions.disable_http_keepalive"
Envoy-sidecar is limited to use up to 1GB of memory by default. When memory consumed by envoy-sidecar service reaches 98%, it starts sending overload responses, which may cause failures in the vCenter internal workloads.
Issue can also occur if there are multiple HTTPS session authentication requests to envoy from any endpoint that communicates to vCenter. It could be possibly a monitoring tool or any service which tries to authenticate with vCenter frequently, creating too many login and logout events and eventually exhausts envoy, causing the vCenter services to go down.
If the issue is identified as multiple HTTPS session authentication requests to envoy and VC is running above 8.0 u3g. Identify what application/service refers to the <IP_address> and engage the respective vendor or team to address the issue.
Incase if the issue is identified as known issue on vCenter running on 8.0 u3g and lower version, note that the issue is resolved in the following releases:
For vCenter 8.x, issue is resolved in 8.0 U3h. Log in to the Broadcom Support Portal to download this patch.
For vCenter 9.x, issue is resolved in 9.0.1.0. Log in to the Broadcom Support Portal to download this patch, depending on the entitlement VMware vSphere Foundation or VMware Cloud Foundation.
# cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back
# sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml
# service-control --restart envoy-sidecar
# sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml
# service-control --restart envoy-sidecar
/etc/vmware-envoy-sidecar/config.yaml file.
- name: "envoy.overload_actions.stop_accepting_requests"
triggers:
- name: "envoy.resource_monitors.global_downstream_max_connections"
threshold:
value: 0.99
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.98
- name: "envoy.overload_actions.reject_incoming_connections"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 1.00
a. Edit the envoy sidecar configuration file using vi editor to remove the two envoy-overload-actions (envoy.overload_actions.stop_accepting_requests and envoy.overload_actions.reject_incoming_connections):
# vi /etc/vmware-envoy-sidecar/config.yaml
After the two envoy-overload-actions are removed, the entire section for overload_manager in the /etc/vmware-envoy-sidecar/config.yaml file should look like this:
overload_manager:
refresh_interval: 1s
resource_monitors:
- name: "envoy.resource_monitors.global_downstream_max_connections"
typed_config:
"@type": type.googleapis.com/envoy.extensions.resource_monitors.downstream_connections.v3.DownstreamConnectionsConfig
max_active_downstream_connections: 8000
- name: "envoy.resource_monitors.fixed_heap"
typed_config:
"@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
max_heap_size_bytes: 4294967296 # 4GB
actions:
- name: "envoy.overload_actions.shrink_heap"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.75
- name: "envoy.overload_actions.disable_http_keepalive"
triggers:
- name: "envoy.resource_monitors.global_downstream_max_connections"
threshold:
value: 0.8
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.95
- name: "envoy.overload_actions.reduce_timeouts"
triggers:
- name: "envoy.resource_monitors.global_downstream_max_connections"
scaled:
scaling_threshold: 0.25
saturation_threshold: 0.97
- name: "envoy.resource_monitors.fixed_heap"
scaled:
scaling_threshold: 0.85
saturation_threshold: 0.97
typed_config:
"@type": type.googleapis.com/envoy.config.overload.v3.ScaleTimersOverloadActionConfig
timer_scale_factors:
- timer: HTTP_DOWNSTREAM_CONNECTION_IDLE
min_timeout: 2s
b. Save the file and exit (press ESC, type :wq!, press Enter)
c. Restart envoy sidecar service:
# service-control --restart envoy-sidecar
Occasionally, the vSAN menu remains missing from the vSphere UI despite the workaround. In these instances, the following logs will still be generated.
Log location: /var/log/vmware/vsphere-ui/logs/vsphere_client_virgo.log
[YYYY-MM-DDTHH:MM] [ERROR] sdk-plugin-deployer-3140 com.vmware.vise.plugin.status.RemotePluginStatusServiceImpl DEPLOYMENT_FAILED: Error deploying plugin package com.vmware.vsan.client:8.0.203.10000. Reason: Plugin configuration with Reverse Proxy failed.
[YYYY-MM-DDTHH:MM] [WARN ] sdk-plugin-deployer-104 com.vmware.bifrost.bus.EventBusLowApiImpl Failed to send message. Cannot find channel: plugin-state-change-notification
[YYYY-MM-DDTHH:MM] [INFO ] sdk-plugin-deployer-106 com.vmware.vise.plugin.registry.VcExtensionStateRegistry Updating entry: Plugin: 'com.vmware.vsan.client:8.0.203.10000', State: 'FAILED_CONFIGURE'
Then restart vmware-vsan-health and vsphere-ui services can resolve the problem.
# service-control --restart vmware-vsan-health
# service-control --restart vsphere-ui