Edge node VM upgrade failed due to SUB-NUMA clustering
search cancel

Edge node VM upgrade failed due to SUB-NUMA clustering

book

Article ID: 404890

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The upgrade performed on the edge VM failed with the following error.

From the Upgrade-coordinator.log


INFO http-nio-127.0.0.1-7442-exec-2 EdgeUpgradePlugin 946840 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] UA reported status: IN_PROGRESS UU UpgradeUnit [id=6fa####-0a##-49##-90##-cf1#######6d, TransportNodeID =6fa####-0a##-49##-90##-cf1#######6d, name=<edge_node>, description=null, type=EDGE, upgradeUnitSubtype=RESOURCE, currentVersion=3.2.3.1.0.22104642, warnings=[Pnic status of the edge transport node 6fa####-0a##-49##-90##-cf1#######6d is DOWN., Overall status of the edge transport node 6fa####-0a##-49##-90##-cf1#######6d is DOWN.], errors=[{"moduleName":"upgrade-coordinator","errorCode":30252,"errorMessage":"Edge 4.2.2.1.0.24765084/Edge/nub/VMware-NSX-edge-4.2.2.1.0.24765090.nub switch OS task failed on edge TransportNode 6fa####-0a##-49##-90##-cf1#######6d: clientType EDGE , target edge fabric node id 6fa####-0a##-49##-90##-cf1#######6d, return status switch_os execution failed with msg: An unexpected exception occurred: CommandFailedError: Command ['chroot', '/os_bak', '/opt/vmware/nsx-edge/bin/config.py', '--update-only'] returned non-zero code 1: b'lspci: Unable to load libkmod resources: error -2\nlspci: Unable to load libkmod resources: error -2\nlspci: Unable to load libkmod resources: error -2\nlspci: Unable to load libkmod resources: error -2\nlspci: Unable to load libkmod resources: error -2\nlspci: Unable to load libkmod resources: error -2\nSystem has not been booted with systemd as init system (PID 1). Can\'t operate.\nFailed to connect to bus: Host is down\nERROR: Unable to get maintenance mode information\nNsxRpcClient encountered an error: [Errno 2] No such file or directory\n/opt/vmware/nsx-edge/bin/config.py:1738: DeprecationWarning: The \'warn\' method is deprecated, use \'warning\' instead\n  cfg_logger.warn("Exception reading InbandMgmtInterfaceMsg from nestdb, %s", e)\nWARNING: Exception reading InbandMgmtInterfaceMsg from nestdb, Command \'[\'/opt/vmware/nsx-nestdb/bin/nestdb-cli\', \'--json\', \'--cmd\', \'get\', \'InbandMgmtInterfaceMsg\']\' returned non-zero exit status 1.\nERROR: NSX Edge configuration has failed. Sub-NUMA Clustering is not supported\n' ."}],

 

 

Environment

  • VMware NSX-T Data Center
  • VMware NSX

Cause

Changes were made to the VM settings via vCenter GUI. In this case, the edge cores per socket was changed to 16 from 1. Editing the Edge VM's resources rather than deploying a new Edge of the desired form factor results in the NUMA configuration in the Edge Operating System being misaligned. This misalignment can cause performance issues and results in the NSX upgrade pre-check alarm.

Resolution

Option 1:

  • Shut down the affected the Edge VM after placing it in maintenance mode and revert the changes (In this case reverted Cores per socket to 1)
  • Power on the Edge VM.
  • Re-run the upgrade pre-check and continue the upgrade.


If Option 1 does not work, replace the Edge node from the cluster.

Option 2:

Additional Information

The Sub-NUMA clustering error occurs on bare metal edges where there are custom changes. Refer KB 320664