Upgrading VMware NSX ESXi transport node(s) fails with "out of memory" error
search cancel

Upgrading VMware NSX ESXi transport node(s) fails with "out of memory" error

book

Article ID: 322648

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • You are upgrading VMware NSX and some of the ESXi transport nodes fail to upgrade.
  • This is an In-place upgrade of the ESXi transport node.
  • The upgrade to version is lower than 4.1.1 and any 3.x version.
  • After the host is rebooted, the host still does not upgrade with the same alerts.
  • In the ESXi logfile /var/run/log/esxupdate.log you see the following:
HostImage: DEBUG: installer LiveImageInstaller failed: VMware_bootbank_nsx-esx-datapath_4.1.0.2.0-7.0.21761693: VMware_bootbank_nsx-esx-datapath_4.1.0.2.0-7.0.21761693: Error in running [/etc/init.d/nsx-datapath-dl start upgrade]: Return code: 1 Output: start upgrade begin Exception: Traceback (most recent call last):   File "/etc/init.d/nsx-datapath-dl", line 1251, in <module>     DualLoadUpgrade()   File "/etc/init.d/nsx-datapath-dl", line 1066, in DualLoadUpgrade     LoadKernelModules()   File "/etc/init.d/nsx-datapath-dl", line 217, in LoadKernelModules     nsxesxutils.loadModule(modName, modParam)   File "/usr/lib/vmware/nsx-esx-datapath/lib64/python/nsxesxutils.py", line 576, in loadModule     raise Exception('Failed to load module %s: %s' % Exception: Failed to load module nsx-esx-70u3/nsxt-ens-21761693: vmkmod: VMKModLoad: VMKernel_LoadKernelModule(nsxt-ens-21761693): Out of memory Cannot load module nsx-esx-70u3/nsxt-ens-21761693: Out of memory   It is not safe to continue. Please reboot the host immediately to discard the unfinished update.. Clean up the installation.
  • In the VMware NSX Manager logfile /var/log/syslog you see the following:
init.d/nsx-datapath-dl: WARNING: nsx-datapath-dl upgrade failed: Failed to load module nsx-esx-70u3/
nsxt-ens-21761693: vmkmod: VMKModLoad: VMKernel_LoadKernelModule(nsxt-ens-21761693): Out of memory Cannot load module nsx-esx-70u3/nsxt-ens-21761693: Out of memory 


Environment

VMware NSX-T Data Center 4.x
VMware NSX-T Data Center 3.x
VMware NSX-T Data Center

Cause

During the upgrade, there is insufficient buddies (2MB blocks), this can lead to a failure to grow Read-Only memory section. This can occur if the memory is fragmented or if VMs have consumed most of the memory.
The reason the host does not upgrade, after the reboot (as per the instructions in the log: Please reboot the host immediately to discard the unfinished update.. Clean up the installation), is due to it being in NSX maintenance mode and is therefore skipped for upgrade again.

Resolution

This issue is resolved in VMware NSX 4.1.1, available at VMware downloads.
This adds a Precheck which will detect if there is a Read-Only memory issue, to prevent upgrades from continuing and allow the transport node to enter an inconsistent state.
If the alert is detected by a Precheck, please reboot the host and rerun the prechecks.

Workaround:
  • If you are planning to upgrade and wish to avoid this issue, please either upgrade to 4.1.1 or above or reboot the transport nodes before attempting the upgrade.
  • If you have already started the upgrade and have encountered this issue, the transport node can be set to upgrade in non dual load mode, this means the previous version of VMware NSX VIBs will be removed and then the new ones installed. After the reboot of the transport node, create the following file:
#mkdir /tmp/nsx2
#echo '{"dual_load": false}' >/tmp/nsx2/debug
Then retry the upgrade of the transport node again from VMware NSX upgrade page.

Note: As we are creating the file in the /tmp/ directory, make sure and do it after the reboot, as the reboot will remove files from the /tmp/ directory.
Also, after attempting an upgrade and carrying out the recommended action from the esxupdate.log 'It is not safe to continue. Please reboot the host immediately to discard the unfinished update.. Clean up the installation.', at this point the host may be in VMware NSX Maintenance Mode (MM), so when you attempt the upgrade again, the host may be skipped, please review the host and exit from VMware NSX MM, before attempting the upgrade again.