Decommissioning the ESXi host from SDDC Manager fails with error "Failed to run guardrail validations of type HOST_DECOMMISSION on resource"
search cancel

Decommissioning the ESXi host from SDDC Manager fails with error "Failed to run guardrail validations of type HOST_DECOMMISSION on resource"

book

Article ID: 441998

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer VMware vCenter Server

Issue/Introduction

  • During the host decommissioning workflow within VMware Cloud Foundation (VCF) SDDC Manager, the management operation fails during validation steps. The SDDC Manager UI shows the following error:

    • Error Message-
      HOST_DECOMMISSION operation validation failed due to: Failed to run guardrail validations of type HOST_DECOMMISSION on resource.

      Message: HOST_DECOMMISSION operation validation failed due to: Failed to run guardrail validations of type HOST_DECOMMISSION on resource.
  • The underlying communication failure prevents inventory sync tasks from concluding properly. This problem is captured in the log file paths listed below:

    • The Operations manager logs /var/log/vmware/operationsmanager/operationsmanager.log says that SDDC is not able to connect to vCenter for the inventory sync during vAPI invocation. The message itself says to reboot the vCenter in order to sync it.
      Validating the result of the inventory sync: {"entitiesToSync":["VCENTER","NSXT_CLUSTER","ESXI"],"entitiesToUpdate":[],"completedInventorySyncTasks":[{"entity":"ESXI","syncStatus":"FAILED","errors":[{"errorCode":"ESX_RESOURCE_VERSION_FETCH_FAILED","errorMessage":"Cannot connect to vCenter <vCenter_fqdn> of domain <domain_name> to complete inventory sync.","errorType":"ERROR","cause":"java.util.concurrent.ExecutionException: (vim.fault.InvalidLogin) {\n   faultCause \u003d null,\n   faultMessage \u003d null\n}","remediation":"Please ensure the vCenter <vCenter_name> is up and running."}]},{"entity":"VCENTER","syncStatus":"FAILED","errors":[{"errorCode":"VC_RESOURCE_VERSION_FETCH_FAILED","errorMessage":"Cannot connect to vCenter <vCenter_name> of domain <domain_name> to complete inventory sync.","errorType":"ERROR","cause":"java.lang.RuntimeException: Exception occurred during vAPI invocation: java.util.concurrent.ExecutionException: com.vmware.vapi.std.errors.ServiceUnavailable: ServiceUnavailable (com.vmware.vapi.std.errors.service_unavailable) \u003d\u003e {\n    messages \u003d [LocalizableMessage (com.vmware.vapi.std.localizable_message) \u003d\u003e {\n    id \u003d com.vmware.vapi.endpoint.cis.ServiceUnavailable,\n    defaultMessage \u003d Service unavailable.,\n    args \u003d [],\n    params \u003d \u003cnull\u003e,\n    localized \u003d \u003cnull\u003e\n}],\n    data \u003d \u003cnull\u003e,\n    errorType \u003d SERVICE_UNAVAILABLE\n}","remediation":"Please ensure the vCenter <vCenter_name> is up and SSH connectivity is available. Reboot/Power up the vCenter if required."}]}


    • From the vCenter Server endpoint logs /var/log/vmware/vapi/endpoint/endpoint.log 
      vAPI-I/O dispatcher-1     | SessionFacade                  | 1855a5bc-d31b-9405-9769-d312fc2162dc | Unexpected error occurred while executing the call with session null for method com.vmware.cis.session.create.

      com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded
    • From the vCenter Server SSO logs /var/log/vmware/sso/ssoAdminServer.log
      [com.vmware.vcenter.tokenservice.providers.VcIdentityInfoProviderImpl] Failed to get identity provider matching domain VMwareID com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded

Environment

VMware vCenter Server 7.x , 8.x

VMware SDDC Manager 5.2.2

Cause

SDDC Manager fails to connect to vCenter Server for inventory synchronization during vAPI invocation due to the vCenter Envoy sidecar service being overloaded. The Envoy sidecar is limited to 1GB of memory by default and returns HTTP 503 overload responses when memory consumption reaches 98%, causing failures in internal vCenter workloads.

  • To further validate the issue, the following command needs to be run from a SSH session to the vCenter. 
    zgrep "503 overload" /var/log/vmware/envoy-sidecar/envoy-access-* | wc -l
    • If the result is different than 0, then execute:

      For vCenter 8.0U3:

      zgrep envoy_server_memory_heap_size{} /var/cache/vmware-rhttpproxy/envoy-sidecar-stats/* | cut -d ' ' -f2|  sort -n | uniq | tail -1 | awk '{print $1 >= 1052266987}'

      For vCenter 9.0:

      zgrep envoy_overload_envoy_resource_monitors_fixed_heap_pressure /var/log/vmware/vstats/metrics/ENVOY_SIDECAR* | grep -v "# TYPE" | cut -d ' ' -f2|  sort -n | uniq | tail -1 | awk '{print $1 >= 98}'


      If the above command returns 1, the envoy-sidecar memory limit has been reached.

  • The currently configured memory heap size for envoy-sidecar service can be found in /etc/vmware-envoy-sidecar/config.yaml
    # cat /etc/vmware-envoy-sidecar/config.yaml | grep -C2 1073741824
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
          max_heap_size_bytes: 1073741824 # 1GB
    actions:
        - name: "envoy.overload_actions.disable_http_keepalive"

Resolution

The remediation options follow below:

  • Immediate Workaround: Reboot the affected vCenter Server appliance to clear the Envoy memory state and allow the host decommission workflow to proceed.


  • Configuration Workaround: Increase the Envoy sidecar memory heap size to 2GB or 4GB as -


    1. Take snapshots of the vCenter or ELM vCenter group (See vCenter in Enhanced Linked Mode pre-changes snapshot best practice for guidance on taking snapshots of vCenters in ELM)

    2. Log in to the vCenter via SSH.

    3. Create a backup of the envoy sidecar config file:
      # cp /etc/vmware-envoy-sidecar/config.yaml /etc/vmware-envoy-sidecar/config.yaml.back
    4. Using sed update the Envoy memory limit from 1073741824 (1 GB) to 2147483648 (2 GB):
      # sed -i 's/max_heap_size_bytes: 1073741824/max_heap_size_bytes: 2147483648/g' /etc/vmware-envoy-sidecar/config.yaml
    5.  Restart envoy-sidecar:
      # service-control --restart envoy-sidecar
    6. Some cases have shown that Envoy memory limit of 2GB is not sufficient. In such cases, update the Envoy memory limit from 2 GB to 4 GB:
      # sed -i 's/max_heap_size_bytes: 2147483648/max_heap_size_bytes: 4294967296/g' /etc/vmware-envoy-sidecar/config.yaml
      # service-control --restart envoy-sidecar

  • Permanent Resolution: Upgrade vCenter Server to following version below-

Additional Information

 vSphere client inaccessible and vAPI endpoint degraded due to Envoy overload