Multiple Manager/Edge processes are repeatedly crashing and generating cores.
search cancel

Multiple Manager/Edge processes are repeatedly crashing and generating cores.

book

Article ID: 345928

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The Manager/Edge is very low on available memory and processes are crashing due to failed memory allocations.
  • The /var/log/syslog file shows logs like the following:

2023-04-28T18:33:15.944295+00:00 <Edge-name> kernel - - - [13094823.723467] audit: audit_backlog=1187323 > audit_backlog_limit=8192

2023-04-28T18:33:15.944297+00:00 <Edge-name> kernel - - - [13094823.723469] audit: audit_lost=17229471 audit_rate_limit=0 audit_backlog_limit=8192

2023-04-28T18:33:15.944298+00:00 <Edge-name> kernel - - - [13094823.723470] audit: backlog limit exceeded

 

  • Multiple cores were generated by the Manager :

/var/log/core/


core.java
core.opsAgent
core.perl
core.python3

  • Multiple cores were generated by the edge :

/var/log/core/

 

core.datapathd

core.lb-dispatcher

 

Environment

VMware NSX-T Data Center

Cause

The kernel contains a defect related to the queuing of audit backlog messages. Due to this, backlog queue length can exceed the maximum configured queue length. New backlog entries are added to the queue but none are ever removed. Eventually this leads to the Manager/Edge completely running out of memory and multiple processes encountering memory allocation errors and crashes.

 

Resolution

This is fixed with VMware NSX-T 3.2.3 and above
This is fixed with VMware NSX-T 4.1.2 and above


Workaround:
  • Root access is required. Use 'start engineer' to access the root shell.
  • Run the following commands to update the grub configuration:

sed -i '/GRUB_CMDLINE_LINUX/s/audit=1//' /etc/default/grub

update-grub

 

  • Then reboot the Manager/Edge for the change to take effect.

 

Additional Information

Impact/Risks:

Multiple Manager/Edge processes are repeatedly crashing and generating cores causing dataplane impact