Data collection for vSphere cloud accounts is failing in Aria Automation. Communication to the accounts and password is working as image synchronization and provisioning status is still completing successfully.
search cancel

Data collection for vSphere cloud accounts is failing in Aria Automation. Communication to the accounts and password is working as image synchronization and provisioning status is still completing successfully.

book

Article ID: 392392

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • The data collection can run for a long time and eventually fails.
  • Provisioning service logs located under /services-logs/prelude/provisioning-service-app/file-logs/provisioning-service-app.log contains the following error messages

    • java.util.concurrent.CompletionException: java.lang.IllegalStateException: Failed to retrieve properties.
    • Fault received from vCenter  [...]   The session is not authenticated.
    • Existing connection has been closed or is invalid. Removing it from pool.

  • Additionally, you will see the similar logs as per below 
     

    ERROR provisioning [host='provisioning-service-app-##########-#####' thread='reactor-http-epoll-##' user='' org='' trace='' parent='' span=''] o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler.error:102 - [######-#####] 500 Server Error for HTTP GET "/provisioning/config/toggles/access"
    com.vmware.automation.spring.webflux.platform.client.service.exception.WebClientServiceResponseException: ClientResponse has erroneous status code: 500 Internal Server Error. WebClientServiceResponseException.ErrorDetails(timestamp="", path=/rbac-service/api/auth-context, type=com.vmware.automation.spring.webflux.platform.client.service.exception.WebClientServiceResponseException, errorCode=0, messageKey=null, messageArguments=null, message=ClientResponse has erroneous status code: 500 Internal Server Error. WebClientServiceResponseExceptio
    n.ErrorDetails(timestamp:"", path=null, type=null, errorCode=0, messageKey=null, messageArguments=null, message=null, causeMessage=null, status=500 INTERNAL_SERVER_ERROR, error=Internal Server Error, exception=null, additional=

    {type=SERVER_ERROR, serverMessage=handshake timed out after 10000ms}
    ), causeMessage=null, status=500, error=Internal Server Error, exception=null, additional=

    {requestId=#############-######}
    )
    at com.vmware.automation.spring.webflux.platform.client.WebClientUtil.toResponseException(WebClientUtil.java:###) ~[platform-client-3.1.####-##############.jar:na]
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

Environment

  • VMware Aria Automation 8.x

Cause

Transient communication issues between Aria Automation and vCenter an cause data collection to become stuck requiring a restart to resolve

Resolution

This issue will be addressed in the upcoming VMware Aria Automation 8.18.1 P5 release. Meanwhile, you can use either of the following methods as a workaround.

  •  You can mark the data collection for a cloud account as Failed using this API call. It will restart automatically within 10 minutes.

    curl -kv --location --request PATCH 'https://<FQDN>/iaas/api/cloud-accounts/<cloud_account_id>?apiVersion=2021-07-15' \
    --header 'Authorization: Bearer <access_token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
      "customProperties": {
        "enumerationTaskState": "FAILED"
      },
      "privateKey":"<vSphere_password>",
      "privateKeyId":"<vSphere_username>"
    }'
      
  • Alternatively, delete the 3 provisioning-service-app pods to restart this service which is responsible for data collection (among other things)

    Note that VM provisioning and other functions will be unavailable for a few minutes while the pods restart.

    1. Get pod names:  
      • kubectl -n prelude get pods | grep provisioning-service-app
    2. Restart pods:  
      • kubectl -n prelude delete pods <POD1> <POD2> <POD3> 

For Similar issue regarding image synchronization, see KB 326017: Restarting stuck image synchronization in VMware Aria Automation

Additional Information