This article exists to raise awareness of this critical memory leak issue.
2023-04-28T18:33:15.944295+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723467] audit: audit_backlog=1187323 > audit_backlog_limit=8192
2023-04-28T18:33:15.944297+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723469] audit: audit_lost=17229471 audit_rate_limit=0 audit_backlog_limit=8192
2023-04-28T18:33:15.944298+00:00 NSX-Edge-1-10-2-192-8 kernel - - - [13094823.723470] audit: backlog limit exceeded
/var/log/core/
core.datapathd
core.lb-dispatcher
The kernel contains a defect related to the queuing of audit backlog messages. Due to this, backlog queue length can exceed the maximum configured queue length. New backlog entries are added to the queue but none are ever removed. Eventually this leads to the Edge completely running out of memory and multiple processes encountering memory allocation errors and crashes.
NSX-T 3.2.3
sed -i '/GRUB_CMDLINE_LINUX/s/audit=1//' /etc/default/grub
update-grub
Multiple Edge processes are repeatedly crashing and generating cores causing dataplane impact