NSX Controller(s) in Disconnected State - Systemctl is unable to restart the rsyslog services
search cancel

NSX Controller(s) in Disconnected State - Systemctl is unable to restart the rsyslog services

book

Article ID: 317873

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • The NSX Controller(s) are in a disconnected state.
  • var/log usage on 100%
root@nsx-controller [ / ]# df -h
Filesystem     Size Used Avail Use% Mounted on
devtmpfs       2.0G    0 2.0G  0% /dev
tmpfs          2.0G    0 2.0G  0% /dev/shm
tmpfs          2.0G  15M 2.0G  1% /run
tmpfs          2.0G    0 2.0G  0% /sys/fs/cgroup
/dev/sda2      3.9G 2.2G 1.5G 61% /
/dev/sda1      976M  41M 868M  5% /boot
tmpfs          2.0G 1.4M 2.0G  1% /tmp
/dev/sda3      3.9G 8.0M 3.6G  1% /os_bak
/dev/sda5      4.8G 4.8G    0 100% /var/log
/dev/sda6      2.0G 3.1M 1.8G  1% /config
/dev/sda4      3.9G 130M 3.5G  4% /var/cloudnet/data
/dev/sda7      4.8G  10M 4.6G  1% /image
tmpfs          396M    0 396M  0% /run/user/998
tmpfs          396M    0 396M  0% /run/user/0
  • Checking /var/log, we see syslog is 0kb where syslog.1 keep increasing in size.
  • If we delete syslog.1 we see /var/log/ partition is still 100%.
  • Systemctl is unable to restart rsyslog:
root@nsx-controller [ ~ ]# /usr/bin/systemctl restart rsyslog.service
Failed to restart rsyslog.service: Activation of org.freedesktop.systemd1 timed out

root@nsx-controller [ /var/log ]# service rsyslog status
Failed to get properties: Activation of org.freedesktop.systemd1 timed out
  • We see multiple services not responding in /var/log/syslog.1:
Sep 27 03:31:01 nsx-controller run-parts[28848][28904]: (/etc/cron.hourly) finished 0anacron

Sep 27 03:31:01 nsx-controller run-parts[28850][28947]: (/etc/cron.minutes) finished logrotate

Sep 27 03:31:02 nsx-controller systemd[1]: nvp-cli-merge-logs.service: Failed to fork: Cannot allocate memory

Sep 27 03:31:02 nsx-controller systemd[1]: Assertion 'pid >= 1' failed at src/core/unit.c:2026, function unit_watch_pid(). Aborting.

Sep 27 03:31:02 nsx-controller systemd[1]: Caught <ABRT>, cannot fork for core dump: Cannot allocate memory.

Sep 27 03:31:02 nsx-controller systemd[1]: Freezing execution.

Cause

This issue occurs when systemd is in a hung state. When systemd fails and hangs, none of the services can be restarted as seen in /var/log/syslog.1:

Sep 27 03:32:26 nsx-controller CROND[29247]: (root) CMDOUT (Failed to kill unit rsyslog.service: Connection timed out) Sep 27 03:32:26 nsx-controller dbus[1385]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out

Sep 27 03:32:26 nsx-controller systemd-logind[1416]: Failed to start session scope session-c124197.scope: Activation of org.freedesktop.systemd1 timed out

Sep 27 03:32:26 nsx-controller CROND[29247]: (root) CMDOUT (error: error running non-shared postrotate script for /var/log/syslog of '/var/log/syslog) Sep 27 03:32:26 nsx-controller CROND[29247]: (root) CMDOUT (')

Resolution

This issue is resolved in NSX 6.4.10.


Workaround:

A reboot of the controller can be used as a temporary workaround.