NSX install-upgrade service goes down intermittently
search cancel

NSX install-upgrade service goes down intermittently

book

Article ID: 389610

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The install-upgrade service goes to a stopped state and becomes a running state within a few minutes - intermittently.
  • Observed using the listed commands upon login to NSX Manager appliance using the 'admin' account;

get service install-upgrade
Service name: install-upgrade
Service state: stopped
Enabled on: 

get service install-upgrade
Service name: install-upgrade
Service state: running
Enabled on: 

  • The NSX Manager's /var/log/syslog & /var/log/upgrade-coordinator/upgrade-coordinator.log can demonstrate the following outputs due to the Out-Of-Memory upgrade coordinator service getting restarted which leads to the install-upgrade service becoming stopped for a few minutes.

##-##-##T##:##:##.###Z nsx-manager-01 systemd 1 - - upgrade-coordinator.service: A process of this unit has been killed by the OOM killer.

 

  • When the available memory is low due to elevated memory usage across all the services, the /var/log/vmware/top-mem.log within NSX Manager will show the swap file being engaged. Below is prior to swap file engagement -- swap file is activated once primary memory has been used up. Location shown for the purpose of monitoring where required within your environment / site.

top - 10:57:01 up 60 days, 10:48,  1 user,  load average: 2.67, 2.66, 2.70
Tasks: 355 total,   2 running, 353 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.1 us,  8.4 sy,  0.0 ni, 72.0 id,  8.0 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem : 24573824 total,   204840 free, 23639684 used,   729300 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   560916 avail Mem 

 

  • When the Linux kernel detects a low memory condition, it starts killing processes based on a heuristic. For example, in /var/log/kern.log you'll see logs like:

####-##-##T##:##:##.###Z nsx-manager-0# kernel - - - [#######.######] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(nil),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/upgrade-coordinator.service

Environment

VMware NSX 4.x

Cause

Lack of enough memory resources assigned to the NSX Manager appliance virtual machine which results in stalled functionality.

Resolution

This issue is resolved in VMware NSX 4.2.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround:

Resize the NSX Manager Appliances to an appropriate size to accommodate memory and cpu requirement. 
Please check the documentation for "Resize an NSX Manager Node"