Multiple Manager and Edge nodes are experiencing memory exhaustion, leading to repeated process crashes and core dump
search cancel

Multiple Manager and Edge nodes are experiencing memory exhaustion, leading to repeated process crashes and core dump

book

Article ID: 345928

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • The NSX Manager appliance or NSX Edge node has very low available memory, causing processes to crash due to failed memory allocations. Alarms related to high memory utilization are also observed in /var/log/syslog
2025-05-26T10:24:05.130Z Example-EDGENODE NSX 18971 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_memory_usage_very_high" eventSev="critical" eventState="On"] The memory usage on Edge node #######-####-####-####-############ has reached 99% which is at or above the very high threshold value of 90%.
ag:02.428Z Example-EDGENODE NSX NSX 28058 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_memory_usage_very_high" eventSev="critical" eventState="On"] The memory usage on Edge node #######-####-####-####-############ has reached 99% which is at or above the very high threshold value of 90%.
  • Log lines similar to the below are encountered in /var/log/syslog:
    kernel - - - [13094823.723467] audit: audit_backlog=1187323 > audit_backlog_limit=8192
    kernel - - - [13094823.723469] audit: audit_lost=17229471 audit_rate_limit=0 audit_backlog_limit=8192
    kernel - - - [13094823.723470] audit: backlog limit exceeded
  • In the VM console of an NSX Edge node you might see:
    grsec: failed fork with errno ENOMEM by ...
  • Multiple cores were generated by the NSX Manager:
    In /var/log/core/
    core.java
    core.opsAgent
    core.perl
    core.python3
  • Multiple cores were generated by the NSX Edge Node:
    In /var/log/core/
    core.datapathd
    core.lb-dispatcher

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center
VMware NSX

Cause

The kernel contains a defect related to the queuing of audit backlog messages. Due to this, backlog queue length can exceed the maximum configured queue length. New backlog entries are added to the queue but none are ever removed. Eventually this leads to the Manager/Edge completely running out of memory and multiple processes encountering memory allocation errors and crashes.

Resolution

This issue is resolved in VMware NSX-T 3.2.3 and VMware NSX 4.1.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.


Workaround:

  • Root access is required: Use 'start engineer' to access the root shell from admin shell, or log in as root directly on the NSX appliance.
  • Run the following commands to update the grub configuration:
    sed -i '/GRUB_CMDLINE_LINUX/s/audit=1//' /etc/default/grub
    update-grub
  • Then reboot the NSX appliance.