Multiple Edge processes are repeatedly crashing and generating cores.
search cancel

Multiple Edge processes are repeatedly crashing and generating cores.

book

Article ID: 345928

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

This article exists to raise awareness of this critical memory leak issue. 

 


Symptoms:
  • The Edge is very low on available memory and processes are crashing due to failed memory allocations.
  • The /var/log/syslog file shows logs like the following:

2023-04-28T18:33:15.944295+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723467] audit: audit_backlog=1187323 > audit_backlog_limit=8192

2023-04-28T18:33:15.944297+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723469] audit: audit_lost=17229471 audit_rate_limit=0 audit_backlog_limit=8192

2023-04-28T18:33:15.944298+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723470] audit: backlog limit exceeded

 

  • Multiple cores were generated by the edge :

/var/log/core/

 

core.datapathd

core.lb-dispatcher


Environment

VMware NSX-T Data Center

Cause

The kernel contains a defect related to the queuing of audit backlog messages. Due to this, backlog queue length can exceed the maximum configured queue length. New backlog entries are added to the queue but none are ever removed. Eventually this leads to the Edge completely running out of memory and multiple processes encountering memory allocation errors and crashes.

 

Resolution

NSX-T 3.2.3


Workaround:
  • Root access is required. Use 'start engineer' to access the root shell.
  • Run the following commands to update the grub configuration:

sed -i '/GRUB_CMDLINE_LINUX/s/audit=1//' /etc/default/grub

update-grub

 

  • Then reboot the Edge for the change to take effect.

 


Additional Information

Impact/Risks:

Multiple Edge processes are repeatedly crashing and generating cores causing dataplane impact