In a VCD environment with RabbitMQ configured, the following error occurs: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
search cancel

In a VCD environment with RabbitMQ configured, the following error occurs: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

book

Article ID: 391249

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

A task suddenly fails in VCD.

If you check vcloud-container-debug.log, you will see the following log around the time the task failed:

2024-11-20 23:06:32,234 | ERROR    | processor-Backend         | DefaultActivityQueueProcessor  | Unxpected error submitting activity com.vmware.ssdc.backend.services.impl.CreateDiskActivity/urn:uuid:########-####-####-####-############ to activity template ActivityTemplate [activityExecutor=com.vmware.vcloud.activity.executors.PersistentActivityExecutor@55b43c68, activityProvider=com.vmware.vcloud.activity.toolkit.SpringActivityProvider@1ec4989e]. Will not retry, clearing queue element. | 
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:798)
        at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
        at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343)
        at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
        at com.vmware.vcloud.activity.executors.LocalActivityExecutor.submit(LocalActivityExecutor.java:368)
        at com.vmware.vcloud.activity.executors.PersistentActivityExecutor.innerSubmit(PersistentActivityExecutor.java:236)
        at com.vmware.vcloud.activity.executors.PersistentActivityExecutor.submit(PersistentActivityExecutor.java:155)
        at com.vmware.vcloud.activity.toolkit.ActivityTemplate.run(ActivityTemplate.java:245)
        at com.vmware.vcloud.activity.toolkit.ActivityTemplate.run(ActivityTemplate.java:215)
        at com.vmware.vcloud.activity.toolkit.ActivityTemplate.run(ActivityTemplate.java:151)
        at com.vmware.vcloud.activity.toolkit.queueing.DefaultActivityQueueProcessor.submitElement(DefaultActivityQueueProcessor.java:388)
        at com.vmware.vcloud.activity.toolkit.queueing.DefaultActivityQueueProcessor$1.run(DefaultActivityQueueProcessor.java:194)

Also, if you check vmware-vcd-watchdog.log, you will see that vmware-vcd-cell has restarted.

2024-11-20 23:05:54 | INFO  | vmware-vcd-cell running
2024-11-20 23:06:55 | ALERT | vmware-vcd-cell is dead but /var/run/vmware-vcd-cell.pid exists, attempting to restart it
2024-11-20 23:07:05 | INFO  | Started vmware-vcd-cell (pid=3955)
2024-11-20 23:07:06 | WARN  | Server status returned HTTP/1.1 404
2024-11-20 23:08:06 | WARN  | Server status returned HTTP/1.1 503
2024-11-20 23:09:06 | WARN  | Server status returned HTTP/1.1 503
2024-11-20 23:11:06 | INFO  | vmware-vcd-cell running

Environment

VMware Cloud Director 10.5
VMware Cloud Director 10.6

Note: The AMQP-based functionality will still work in VCD 10.6, but it is deprecated from VCD 10.6 and no longer supported. See the document.

Cause

When the connection between the VCD (RabbitMQ Client) and the RabbitMQ server is forcibly disconnected, the AMQP thread will remain in the VCD.
In environments where forced disconnections occur periodically, such as by a load balancer, the VCD experiences "java.lang.OutOfMemoryError" and the vmware-vcd-cell service will be restarted repeatedly.
The frequency with which this occurs depends on the environment, but in the target environment it has been confirmed that this occurs in turn in one of the cells every few days.

Resolution

Currently there is no fix.
Please resolve any issues that may be preventing the connection between VCD and RabbitMQ Server.
Alternatively, you can configure MQTT instead of AMQP.

Workaround:

Count the number of AMQP-related threads from the thread dump, and if it exceeds 1000, restart the vmware-vcd-cell manually.

# /opt/vmware/vcloud-director/bin/cell-management-tool support -i $(service vmware-vcd pid cell) -t | grep -c "AMQP Connection <RabbitMQ Server IP>:5672"

Mitigation:

You may be able to reduce disconnections from the LB by shortening the AMQP heartbeat interval.

  1. Check the current AMQP heartbeat settings
    # /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n amqp.heartbeat -l

  2. Set the AMQP heartbeat value (the example below sets it to 30 seconds)
    # /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n amqp.heartbeat -v 30

Alternatively, remove the Load Balancer and connect VCD and RabbitMQ Server directly.