Edge or Manager upgrade to 4.2.4 fails when the appliance fails to boot, new 4.2.4 deployments also impacted.
search cancel

Edge or Manager upgrade to 4.2.4 fails when the appliance fails to boot, new 4.2.4 deployments also impacted.

book

Article ID: 445142

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • SDDC Edge upgrade errors

    Edge group upgrade status is FAILED for group <UUID> <GROUP NAME> : [Edge 4.2.4.0.0.25410638/Edge/nub/VMware-NSX-edge-4.2.4.0.0.25410643.nub post power on task failed on edge TransportNode <UUID>: clientType EDGE . target
    edge fabric node id <UUID>, return status Issuing post_power_on failed to complete in time



    Upgrade Agent on Edge node <UUID> is unreachable. Restart the Upgrade agent service and check network connectivity.
    Management plane connection status of the edge transport node <UUID> is DOWN.
    Cannot upgrade edge node <UUID> , has errors Errors = [{"moduleName":"upgrade-coordinator","errorCode":30249,"errorMessage":"Upgrade Agent on Edge node <UUID> is unreachable. Restart the Upgrade agent service and check network connectivity."}, {"moduleName":"upgrade-coordinator","errorCode":30205,"errorMessage":"Management plane connection status of the edge transport node <UUID> is DOWN."}, ] .
  • NSX Manager, /var/log/upgrade-coordinator/upgrade-coordinator.log has an ERROR similar to this example
    <DATE>T10:14:17.331Z ERROR task-executor-24-1-workitem-EDGE-<UUID> EdgeNodeUpgradeServiceImpl 879798 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30290" level="ERROR" subcomp="upgrade-coordinator"] Edge 4.2.4.0.0.25410638/Edge/nub/VMware-NSX-edge-4.2.4.0.0.25410643.nub post power on task failed on edge TransportNode <UUID>: clientType EDGE , target edge fabric node id <UUID>, return status Issuing post_power_on failed to complete in time .
    com.vmware.nsx.management.upgrade.exceptions.UpgradeAgentMessagingServiceException: Issuing post_power_on failed to complete in time
  • NSX Manager boot up /Upgrade reboot_os step stuck console screenshot gets stuck at boot
    A start job is running for Copy jars to NSX application dirs
  • NSX Edge boot up is stuck at the following point:
    A start job is running for Set OVF ... rams
  • ssh to appliance fails, through direct console admin/audit user login does not work, root login may suceed

Environment

VMware NSX 4.2.4 (VCF 5.2.4)

Cause

NSX 4.2.4 appliances require the RDSEED CPU instruction set to function. If this feature is missing from the underlying physical server's hardware or is masked by a legacy vSphere Enhanced vMotion Capability (EVC) cluster baseline, the appliance cannot boot properly. This issue can impact upgrade and install.
This issue occurs when NSX appliances run on vSphere clusters which match the following criteria:

1. Unsupported VMware EVC Baseline Configurations

Intel CPU Mode: Intel "Haswell" Generation (L6) or earlier.
AMD CPU Mode: AMD Opteron™ "Steamroller" Generation (B3) or earlier

or

2. Unsupported Server CPU Series (with EVC Disabled)

Intel "Haswell" (or earlier) processors.
AMD Opteron™ "Steamroller" (or earlier) processors

Resolution

This is a known issue impacting VMware NSX 4.2.4.

Remediation options:

Option 1: For EVC clusters configured with an unsupported mode.
Update the vSphere cluster EVC settings from the unsupported modes listed above to a supported, newer CPU mode. See Resolution section of KB#318962.

or

Option 2: For unsupported CPU hardware
If the NSX appliances are hosted on older physical servers with EVC disabled, migrate the VMs to a supported hardware generation CPU.
 
or 

Option 3: If the recommended resolutions above cannot be implemented

On vSphere 8.0 or later - configure NSX-T appliance to use entropy from ESXi host (configure an External Entropy Source).

Reference: Configure an External Entropy Source

1. Log in to vCenter Server from vSphere Client.
2. Browse to the NSX Appliance VM.
3. Shut down the VM.
4. Right-click the VM and click Edit Settings.
5. Select Advanced Parameters.
6. Set/Add the isolation.tools.getEntropy.disable parameter to FALSE.
7. Click OK.
8. Reboot / power-on appliance
9. Resume installation/upgrade