Disk write of NSX Edge VMs periodically spikes every hour.
search cancel

Disk write of NSX Edge VMs periodically spikes every hour.

book

Article ID: 322871

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • vCenter performance chart of Edge VMs shows disk write periodically spikes on the hour.

  • Other VMs might be suffered from degraded storage performance if many Edge VMs reside in the same physical storage.
  • Many 8MB files are generated on the hour in /var/log/journal/<machine-id> .

Environment

VMware NSX-T Data Center 3.1.0 - 3.1.2.1

Cause

  • Edge appliances run integrity checker every hour. It executes find / -print0 | xargs -0 to check integrity of many files in the appliance.

  • Since 3.1.0 auditd logs execve system calls and the logs are stored in journal log.  Integrity checker passes tremendous numbers of arguments to xargs, and all the arguments of execve logs are considered as field name by journald hence field hash table of a journal file grows rapidly beyond the threshold, and the file is rotated immediately.

  • Each journal file is 8MB at minimum thus 8MB journal files rotate so fast and so many journal files are generated that large amount of disk write is triggered on the hour.

  • Manager VMs are not affected because auditd does not log execve system calls.

Resolution

This issue is resolved in VMware NSX-T Data Center 3.1.3 and 3.2 and higher available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

There are 2 workarounds.
Workaround 1:

  1. Mask systemd-journald-audit.socket.
    /bin/systemctl stop systemd-journald-audit.socket
    /bin/systemctl disable systemd-journald-audit.socket
    /bin/systemctl mask systemd-journald-audit.socket
  2. Then restart journald.
    systemctl restart systemd-journald

Workaround 2:

Disable integrity checker.
/opt/vmware/integrity-checker/bin/integrity_checker.py -f disable

Additional Information

Impact/Risks:

Edge VMs trigger large disk write on the hour, at the same time.

It might degrade datastore performance if many Edge VMs reside in the same physical storage.

Other VMs might be suffered from such degraded storage performance.