Services fail to start and the vco-server-app pods keep terminating with exit code 143
search cancel

Services fail to start and the vco-server-app pods keep terminating with exit code 143

book

Article ID: 314931

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • VMware Aria Automation and/or VMware Aria Automation Orchestrator 8.x services fail to start.
  • In vco-server-app-catalina.log file located in /services-logs/prelude/vco-app/file-logs/ directory you see messages similar to:
    org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [vco] created a ThreadLocal with key of type [java.lang.InheritableThreadLocal] (value [java.lang.InheritableThreadLocal) and a value of type [org.eclipse.jgit.nls.NLS] (value [org.eclipse.jgit.nls.NLS but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
  • You may also find messages similar to 

    com.vmware.o11n.git.StagingRepositoryService - Checkout took *ms

    Note: Where the asterisk represents some number which is the number of milliseconds that it has taken to clone the Version History Git repository.

  • In the vco-server-app_catalina.log there are messages which state the following:
    06-Jul-2023 18:01:04.175 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [vco] appears to have started a thread named [vcoSystemTaskScheduler-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
    [email protected]/java.io.UnixFileSystem.delete0(Native Method)
    [email protected]/java.io.UnixFileSystem.delete(UnixFileSystem.java:276)
    [email protected]/java.io.File.delete(File.java:1064)
    org.eclipse.jgit.internal.storage.file.ObjectDirectoryPackParser.cleanupTemporaryFiles(ObjectDirectoryPackParser.java:312)
    org.eclipse.jgit.internal.storage.file.ObjectDirectoryPackParser.parse(ObjectDirectoryPackParser.java:192)
    org.eclipse.jgit.transport.PackParser.parse(PackParser.java:495)
    org.eclipse.jgit.transport.BasePackFetchConnection.receivePack(BasePackFetchConnection.java:1016)
    org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:394)
    org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:301)
    org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:292)
    org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:273)
    org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:170)
    org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:93)
    org.eclipse.jgit.transport.Transport.fetch(Transport.java:1309)
    org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:213)
    com.vmware.o11n.git.StagingRepositoryService.checkout(StagingRepositoryService.java:346)
    com.vmware.o11n.git.StagingRepositoryService.checkout(StagingRepositoryService.java:314)
    com.vmware.o11n.service.version.ContentVersionRepositoryFactoryImpl.initializeRepo(ContentVersionRepositoryFactoryImpl.java:53)
    com.vmware.o11n.service.version.ContentVersionRepositoryFactoryImpl.createBareRepository(ContentVersionRepositoryFactoryImpl.java:99)
  • /data/vco/usr/lib/vco/app-server/data/git/__SYSTEM.git/ directory is several GB in size



Environment

VMware Aria Automation Orchestrator 8.x 
VMware Aria Automation  8.x

Cause

A very large GIT repository on a VMware Aria Automation or Automation Orchestrator appliance can prevent the services from starting up in a timely fashion. A typical GIT repository size should be several MB. This issue can occur if the repository is gigabytes in size.

Excessive requests via rest api to create configuration elements can cause the GIT repository to grow in size. For example rest api calls to URL:  POST /vco/api/configurations

Instead of creating these objects via Rest api calls consider creating these via the scripting API of the Orchestrator. This approach does not require such git commits. As an example the following run from a scripting element in a workflow would create a configuration element without a git commit:

Server.createConfigurationElement("root", "test"); 

Resolution

To resolve the issue, perform a manual garbage collection on the GIT repository.

Note: Please take a snapshot of the nodes before performing the below operation.

Prerequisites

  • Another Linux system with Java (JRE) installed is required to perform the garbage collection.

Procedure

  1. Download JGit cli client and upload it to your Linux server:
  1. Copy the /data/vco/usr/lib/vco/app-server/data/git/__SYSTEM.git directory from the appliance to a Linux system with JGit client installed.
  2. On your Linux system, run garbage collection on your GIT repository:
    chmod +x org.eclipse.jgit.pgm-6.4.0.202211300538-r.sh
    ./org.eclipse.jgit.pgm-6.4.0.202211300538-r.sh gc --aggressive --prune-preserved
  3. Copy the repository back to your Automation Orchestrator appliance on all nodes.
  4. Stop all vco-app pods with command below:
    kubectl -n prelude scale deployment vco-app --replicas=0
  5. Remove the /data/vco/usr/lib/vco/app-server/data/git/__SYSTEM.git on each node and replace with the cleaned __SYSTEM.git directory.
  6. Restart all vco-app pods:
    kubectl -n prelude scale deployment vco-app --replicas=3