Multiple Manager and Edge nodes are experiencing memory exhaustion, leading to repeated process crashes and core dump

search cancel

Multiple Manager and Edge nodes are experiencing memory exhaustion, leading to repeated process crashes and core dump

book

Article ID: 345928

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

The NSX Manager appliance or NSX Edge node has very low available memory, causing processes to crash due to failed memory allocations. Alarms related to high memory utilization are also observed in /var/log/syslog

2025-05-26T10:24:05.130Z Example-EDGENODE NSX 18971 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_memory_usage_very_high" eventSev="critical" eventState="On"] The memory usage on Edge node #######-####-####-####-############ has reached 99% which is at or above the very high threshold value of 90%.
ag:02.428Z Example-EDGENODE NSX NSX 28058 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_memory_usage_very_high" eventSev="critical" eventState="On"] The memory usage on Edge node #######-####-####-####-############ has reached 99% which is at or above the very high threshold value of 90%.

Log lines similar to the below are encountered in /var/log/syslog:

kernel - - - [13094823.723467] audit: audit_backlog=1187323 > audit_backlog_limit=8192
kernel - - - [13094823.723469] audit: audit_lost=17229471 audit_rate_limit=0 audit_backlog_limit=8192
kernel - - - [13094823.723470] audit: backlog limit exceeded

In the VM console of an NSX Edge node you might see:
```
grsec: failed fork with errno ENOMEM by ...
```
Multiple cores were generated by the NSX Manager:
In /var/log/core/
```
core.java
core.opsAgent
core.perl
core.python3
```
Multiple cores were generated by the NSX Edge Node:
In /var/log/core/
```
core.datapathd
core.lb-dispatcher
```

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center
VMware NSX

Cause

The kernel contains a defect related to the queuing of audit backlog messages. Due to this, backlog queue length can exceed the maximum configured queue length. New backlog entries are added to the queue but none are ever removed. Eventually this leads to the Manager/Edge completely running out of memory and multiple processes encountering memory allocation errors and crashes.

Resolution

This issue is resolved in VMware NSX-T 3.2.3 and VMware NSX 4.1.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround:

Root access is required: Use 'start engineer' to access the root shell from admin shell, or log in as root directly on the NSX appliance.

Run the following commands to update the grub configuration:

sed -i '/GRUB_CMDLINE_LINUX/s/audit=1//' /etc/default/grub
update-grub

Then reboot the NSX appliance.

Feedback

thumb_up Yes

thumb_down No