Automation Orchestrator pods continuously restart with no obvious heap dumps or errors
search cancel

Automation Orchestrator pods continuously restart with no obvious heap dumps or errors

book

Article ID: 376909

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Orchestrator pods are restarting frequently
  • There are no heap dumps created in /services-logs/prelude/vco-app/file-logs/
  • There are no OutOfMemory Exceptions in the server logs
  • Journal logs show oom-killer records
  • Garbage collector logs show allocation failures: /services-logs/prelude/vco-app/file-logs/vco-server-app_gc.log

[2025-07-16T08:36:17.407+0000][2.627s][info][gc,heap     ] GC(0) ParOldGen: 0K(1502208K)->104K(1502208K)
[2025-07-16T08:36:17.407+0000][2.627s][info][gc,metaspace] GC(0) Metaspace: 14254K(14528K)->14254K(14528K) NonClass: 12837K(12992K)->12837K(12992K) Class: 1417K(1536K)->1417K(1536K)
[2025-07-16T08:36:17.407+0000][2.627s][info][gc          ] GC(0) Pause Young (Allocation Failure) 550M->11M(2108M) 16.906ms
[2025-07-16T08:36:17.407+0000][2.627s][info][gc,cpu      ] GC(0) User=0.05s Sys=0.00s Real=0.02s
[2025-07-16T08:36:17.407+0000][2.627s][info][safepoint   ] Safepoint "ParallelGCFailedAllocation", Time since last: 746961732 ns, Reaching safepoint: 3963 ns, Cleanup: 120317 ns, At safepoint: 16980401 ns, Total: 17104681 ns
[2025-07-16T08:36:18.602+0000][3.822s][info][gc,start    ] GC(1) Pause Young (Allocation Failure)

Environment

VMware Aria Automation Orchestrator 8.13 and later

Cause

 There isn't enough non-heap memory for the garbage collector to work properly.

Resolution

Improvements in memory allocation are made in Orchestrator 8.18.1 Patch 2. Refer to VMware Aria Automation 8.18.1 Cumulative Update #2

Workaround for standalone Orchestrator Appliances

Prerequisites

Snapshot your environment.

Procedure

  1. Edit the resource metrics file in your custom profile with the desired memory values.
    vi /etc/vmware-prelude/profiles/custom-profile/helm/prelude_vco/90-resources.yaml
  2. Ensure that the serverMemoryRequest is at least 50% bigger than serverJvmHeapMax and that serverMemoryLimit is at least 2G bigger than serverMemoryRequest.
    • In case serverMemoryRequest cannot be enlarged, decrease the serverJvmHeapMax to 60% of the serverMemoryLimit or less.
    • For Aria Orchestrator 8.18.1 environments with this issue, it is recommended to set serverJvmHeapMax to 40% of the serverMemoryLimit
  3. Run /opt/scripts/deploy.sh to restart the system.



Workaround for Embedded Orchestrator in Aria Automation Appliances on version 8.18.1

There is an additional memory allocation library jemalloc included in version 8.18.1 which has shown to manage non-heap memory for effectively under certain conditions.


Prerequisites

Snapshot your environment.

Procedure

  1. SSH to Aria Automation appliance(s) as root user. 
  2. On each appliance in /opt/charts/vco/templates/deployment.yaml:

    change
            ./create_server_symlinks && rm -rf /usr/lib/vco/app-server/conf/restart_required && /var/opt/apache-tomcat/bin/catalina.sh run
    
    to
    
            ./create_server_symlinks && rm -rf /usr/lib/vco/app-server/conf/restart_required && export LD_PRELOAD=/usr/lib/libjemalloc.so.2 && /var/opt/apache-tomcat/bin/catalina.sh run

    i) To open file in text editor:

    vi /opt/charts/vco/templates/deployment.yaml

    ii) Select insert key to enter edit mode

    iii) To write and save changes: 

    :wq!

    To exit without saving:

    :q!


  3. After the change in deployment.yaml redeploy services with deploy.sh, this only needs to be done on the primary node:

    /opt/scripts/deploy.sh
  4. Once services come back up. Verify that the change is applied correctly with:

    k -n prelude describe deployment vco-app | less

    where you should see the change in vco-server-app parameters

    ...   
      vco-server-app:
        Image:      vco_private:latest
        Port:       8280/TCP
        Host Port:  0/TCP
        Command:
          /bin/bash
          -c
          ./create_server_symlinks && rm -rf /usr/lib/vco/app-server/conf/restart_required && export LD_PRELOAD=/usr/lib/libjemalloc.so.2 && /var/opt/apache-tomcat/bin/catalina.sh run
        Limits:
    ... 




Additional Information

For steps to scale the heap memory size of the Automation Orchestrator Server, refer to the documentation: Link