vCenter services experiencing OutOfMemoryError: Java heap space, causing the swap memory to be 100% utilized and due to which vCenter backup fails with " Error: database or disk is full"
search cancel

vCenter services experiencing OutOfMemoryError: Java heap space, causing the swap memory to be 100% utilized and due to which vCenter backup fails with " Error: database or disk is full"

book

Article ID: 396939

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Memory leak in vc-w1sa-broker service results in swap getting filled 
  • Swap utilized almost 100%, affects other services like observability and  vmware-postgres-archiver, both are STOPPED status by memory allocation failure.
  • tmpfs(/tmp) is mounted in memory and uses the swap area. VAMI based backup is backed up to /tmp (/tmp/backup_stellardb/stellar.db),  causing the backup to fail with the error "database or disk is full" due to the swap space being full.
This is confirmed by running free command 
<example>
# free
               total                 used              free      shared  buff/cache   available
Mem:        21502260    10915648      186524      707668    10400088     9486496
Swap:       26206204    25806436      399768    <<< almost 98% used
  • Identify the process ID (PID) experiencing  a memory leak and consuming most of the swap space,
The following command shows which process that uses swap the most.
 <example>
root@vcenter [ ~ ]# grep Swap /proc/*/status| sort -n -k 2
/proc/1165384/status:VmSwap:         0 kB
/proc/1166238/status:VmSwap:         0 kB
--------------------------------------
/proc/10168/status:VmSwap:        525780 kB
/proc/5913/status:VmSwap:          585168 kB
/proc/1234/status:VmSwap:        20787736 kB     <<< this PID 1234 process is most occupied swap area.
root@vcenter[ ~ ]# ps -auwwx | grep 1234         <<< PID 1234 is idmserv+ (accesscontrol service) process
idmserv+    1234 46.1  3.5 23249320 768572 ?     Sl    2024 141353:11 java -Dlog4jRootFolder=/opt/vmware/idm/accesscontrol/logs -Dlog4j.configurationFile=log4j2-base.xml,log4j2-accesscontrol.xml,/opt/vmware/idm/accesscontrol/config/log4j2-override.xml -Dlog4jRequestContextKeysToMdc=clientId,grantType -

 

less /var/log/vmware/vmon/vmon.log

YYYY-MM-DDT16:00:23.640Z In(05) host-1234 <vmware-postgres-archiver-prestart> Constructed command: /opt/vmware/vpostgres/current/scripts/pg_archiver_pre_start vpg_archiver 300
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 fork failed. Cannot allocate memory
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 <vmware-postgres-archiver> Service pre-start command could not started.
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 <vmware-postgres-archiver> Service reached max quick failure count. Give up!!!
 YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 fork failed. Cannot allocate memory
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 <vmware-postgres-archiver> Service pre-start command could not started.
YYYY-MM-DDT16:00:23.640Z Wa(03) host-1234 <vmware-postgres-archiver> Service failed to recover. Try again. Fail count 1
YYYY-MM-DDT16:00:23.640Z In(05) host-1234 <vmware-postgres-archiver-prestart> Constructed command: /opt/vmware/vpostgres/current/scripts/pg_archiver_pre_start vpg_archiver 300
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 fork failed. Cannot allocate memory
YYYY-MM-DDT16:00:23.640Z Er(02) host-1234 <vmware-postgres-archiver> Service pre-start command could not started.
YYYY-MM-DDT16:00:23.640Z Wa(03) host-1234 <vmware-postgres-archiver> Service failed to recover. Try again. Fail count 2
YYYY-MM-DDT16:00:23.640Z In(05) host-1234 <vmware-postgres-archiver-prestart> Constructed command: /opt/vmware/vpostgres/current/scripts/pg_archiver_pre_start t vpg_archiver 300

 

less /var/log/vmware/vc-ws1a-broker/accesscontrol-service.log

YYYY-MM-DDT23:44:02,266 ERROR vcenter.vsphere.local:accesscontrol (vert.x-eventloop-thread-3) [-;-;-;-;-;-;-] io.vertx.core.impl.ContextBase - Unhandled exc
eption java.lang.OutOfMemoryError: Java heap space
YYYY-MM-DDT23:44:02,487 ERROR vcenter.vsphere.local:accesscontrol (vert.x-eventloop-thread-5) [-;-;-;-;-;-;-] io.vertx.core.impl.ContextBase - Unhandled exc
eption java.lang.OutOfMemoryError: Java heap space
YYYY-MM-DDT07:01:05,133 INFO  vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] com.vmware.vidm.accesscontrol.db.OAuth2AuthorizationCodeD
ataServiceImpl - Starting purge of expired authorization codes
YYYY-MM-DDT07:01:44,126 WARN  vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Er
ror: 0, SQLState: null
YYYY-MM-DDT07:01:44,126 ERROR vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] org.hibernate.engine.jdbc.spi.SqlExceptionHelper - An att
empt by a client to checkout a Connection has timed out.
YYYY-MM-DDT07:02:08,136 ERROR vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] com.vmware.vidm.accesscontrol.db.OAuth2AuthorizationCodeD
ataServiceImpl - Caught exception during scheduled purging of authorization codes java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: Java
heap space
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
        at com.vmware.vidm.common.async.ContextPassingExecutor.lambda$wrap$0(ContextPassingExecutor.java:48)
        at io.micrometer.core.instrument.internal.TimedRunnable.run(TimedRunnable.java:49)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
YYYY-MM-DDT07:02:13,679 ERROR vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] com.vmware.vidm.accesscontrol.db.AbstractDataServiceImpl
- Unexpected exception java.lang.OutOfMemoryError: Java heap space
YYYY-MM-DDT07:01:07,835 INFO  vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] com.vmware.vidm.accesscontrol.db.OAuth2AuthorizationCodeD
ataServiceImpl - Starting purge of expired authorization codes
YYYY-MM-DDT07:01:44,482 WARN  vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Er
ror: 0, SQLState: null
YYYY-MM-DDT07:01:44,482 ERROR vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] org.hibernate.engine.jdbc.spi.SqlExceptionHelper - An att
empt by a client to checkout a Connection has timed out.
YYYY-MM-DDT07:02:52,604 ERROR vcenter.vsphere.local:accesscontrol (acs-rds-db-ops) [-;-;-;-;-;-;-] com.vmware.vidm.accesscontrol.db.DbDataStoreAutoConfigura
tion - ALERT!! Thread acs-rds-db-ops threw an uncaught exception java.lang.OutOfMemoryError: Java heap space

 

Environment

VMware vCenter 8.0 U2a and later

Cause

A memory leak was observed during certain situations in handling the tokens. The changes in the values help in avoiding the leak.
This issue is reported as a bug from vCenter 8.0 U2a and later and will be adjusted in a future release.

Resolution

To resolve the current issue, please follow the below workaround and set the parameters using the procedure below.
  • Log in to the vCenter Server via SSH and switch to BASH
  • Edit the configuration file located in the following directory using vi or similar.
    • /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/12/fs/opt/vmware/idm/initc/services/token/config/application.properties 
    • Note: Please use the snapshot with highest version to modify the properties file 
<example>
  • you may find multiple snapshot folders like below and need to  use the snapshot with highest version to modify the properties file 

  • found below two "application.properties" in different directory like below.

/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/11/fs/opt/vmware/idm/initc/services/token/config/application.properties
/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/12/fs/opt/vmware/idm/initc/services/token/config/application.properties
  • Add the following three parameters to the highest(12th) snapshot:
    token.delete.expired.tokens.limit=100
    revocation.delete.old.tombstones.limit=100
    revoke.by.oauthclientid.batch.size=500
  • Execute the following command to restart the service.
    # service-control --restart vc-ws1a-broker

Note:
You can use the find command to locate the application.properties file for the token application within the container filesystem snapshots:

Example:
# cd /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots
# find -iname application.properties | grep token
./13/fs/opt/vmware/idm/initc/services/token/config/application.properties
./12/fs/opt/vmware/idm/initc/services/token/config/application.properties

Important Note:
The filesystem where application.properties is located is an overlay filesystem essential for the vc-ws1a-broker service configuration. Including unnecessary files in this location will corrupt the service's configuration information.

Do not create files (such as backups or temporary files) on the same filesystem where application.properties is located. Please utilize other directories such as  /var/core for temporary operations.

See KB 416157 for detail information.

Additional Information

The vc-ws1a-broker service fails to start after creating a backup of the service's application.properties file and restarting the service