Memory leaks in NSX-T Edge
search cancel

Memory leaks in NSX-T Edge

book

Article ID: 311847

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Edge node running out of memory.
  • top command on edge shows kauditd using high cpu

/var/log/vmware/top-cpu.log:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+  TGID COMMAND
   34 root 20 0   0    0   0  R 87.5  0.0 11650:31 34 [kauditd]

  • /proc/slabinfo shows kmalloc-2048 usage is high

/proc/slabinfo:
...
kmalloc-2048 659406 659406 2048 16 8 : tunables 0 0 0 : slabdata 41268 41268 0

  • /var/log/syslog will have too many messages related to "audit: audit_backlog=xxxx > audit_backlog_limit=8192"
/var/log/syslog:
...
<yyyy-mm-dd>T<hr:min:sec.xxx>Z <hostname> kernel - - - [xxxxxxxxx.xxxxxx] audit: audit_backlog=xxxxxx > audit_backlog_limit=8192
<yyyy-mm-dd>T<hr:min:sec.xxx>Z <hostname> kernel - - - [xxxxxxxxx.xxxxxx] audit: audit_lost=xxxxxx audit_rate_limit=0 audit_backlog_limit=8192
<yyyy-mm-dd>T<hr:min:sec.xxx>Z <hostname> kernel - - - [xxxxxxxxx.xxxxxx] audit: backlog limit exceeded


Environment

VMware NSX-T Data Center
VMware NSX

Cause

  • This is a Linux kernel bug.
  • It's unclear what the trigger is and likely related to high system load + high rate of audit events.
  • System will be running out of memory and start killing processes.
  • The only way to recover is reboot the edge node.

Resolution

This issue is resolved in NSX-T 3.2.3.

  • Login to edge node as user admin
  • Switch to the root account by running the command st en
  • Open the file /etc/default/grub
  • Look for the line GRUB_CMDLINE_LINUX="audit=1".
  • Remove or comment this line and save the file.
  • Run update-grub2
  • Reboot
  • After reboot, check the string audit=1 is no longer in `cat /proc/cmdline`