Intermittent Deployment Failures in Aria Automation Due to Provisioning-Service Instability
search cancel

Intermittent Deployment Failures in Aria Automation Due to Provisioning-Service Instability

book

Article ID: 395269

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Intermittent failures in Deployments and Day2 actions within Aria Automation. The errors returned are related to network and compute placement:
    • No placement exists that satisfies all of the request requirements

  • The issue may be inconsistent, with retries often leading to successful deployments.
  • provisioning-service-app pods crash without any clear error or reason given in /services-logs/prelude/provisioning-service-app/file-logs/provisioning-service-app.log
  • Heap dumps with extension *.hprof are created in this directory /services-logs/prelude/provisioning-service-app/file-logs/
  • The following error can be seen at the time of pod crashing, in /services-logs/prelude/provisioning-service-app/console-logs/provisioning-service-app.log :
    • java.lang.OutOfMemoryError: Java heap space
      Dumping heap to /var/log/####-##-##_##-##-##_heap_dump.hprof ...
      Heap dump file created [########## bytes in ##.### secs]
      Terminating due to java.lang.OutOfMemoryError: Java heap space

  • In some scenarios, the error messages may indicate a mismatch with the selected network profile.
  • During the allocation the provisioning service is getting restarted and reports 404 error code. Snippets below
    • WARN provisioning [host='provisioning-service-app-<ID>' thread='xn-index-queries-11' user='tango-blueprint-<ID>(<user>)' org='<org_id>' trace='<trace_id>' parent='<parent_id>' span='<span_id>'] c.v.xenon.common.ServiceErrorResponse.create:83 - message: Service not found: http://<IP_Address>:8282/provisioning/config/extensibility-callbacks/<Callback_id>, statusCode: 404, serverErrorId: <serverErrorId>
    • WARN provisioning [host='provisioning-service-app-<ID>' thread='xn-index-queries-11' user='tango-blueprint-<ID>(<user>)' org='<org_id>' trace='<trace_id>' parent='<parent_id>' span='<span_id>'] c.v.a.s.w.p.x.c.StatefulServiceController.lambda$loadService$20:708 - Failed to start service /provisioning/config/extensibility-callbacks/<Callback_id> with 404 status code.
    • WARN provisioning [host='provisioning-service-app-<ID>' thread='xn-index-queries-11' user='tango-blueprint-<ID>(<user>)' org='<org_id>' trace='<trace_id>' parent='<parent_id>' span='<span_id>'] c.v.xenon.common.ServiceErrorResponse.create:83 - message: Service not found: http://<IP_Address>:8282/provisioning/config/extensibility-callbacks/<Callback_id>, statusCode: 404, serverErrorId: <serverErrorId>

  • When the provisioning services get restarted on the 3 nodes during the deployment, the OOM reasons may be seen for these restarts:
    • java.lang.OutOfMemoryError: Java heap

Environment

VMware Aria Automation 8.x

Cause

  • The root cause is the provisioning-service running out of heap memory.
  • During the allocation phase, the service would restart. This may produce HTTP 404 errors, resulting in failure to allocate necessary network or deployment resources.

Resolution

  • Vertical scale up (Sizing) of the Automation nodes from current profile to XL hardware profile
  • Vertical scale up will address the high usage demand requirement mainly to address the JVM heap.
  • JVM heap for all the service within Aria Automation is controlled via sizing guideline:
    • The Medium profile for Aria Automation allocates 4GB of memory to the provisioning-service-app where 2400MB of this  is provided as Java heap memory.
  • If you are already using Automation XL profile, please contact Broadcom Support for assistance