VMware NSX-T edge node prechecks option greyed out during upgrade
search cancel

VMware NSX-T edge node prechecks option greyed out during upgrade

book

Article ID: 375895

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

  • During NSX-T upgrade, edge prechecks never start.
  • The following log entries are seen in the NSX-T manager log /var/log/upgrade-coordinator/upgrade-coordinator-tomcat-wrapper.log:
    INFO | jvm 3 | <DATE/TIME> | #
    INFO | jvm 3 | <DATE/TIME> | # java.lang.OutOfMemoryError: Java heap space
    STATUS | wrapper | <DATE/TIME> | The JVM has run out of memory. Requesting thread dump.
    STATUS | wrapper | <DATE/TIME> | Dumping JVM state.
    STATUS | wrapper | <DATE/TIME> | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).
    INFO | jvm 3 | <DATE/TIME> | # -XX:OnOutOfMemoryError="gzip -f /image/core/uc_oom.hprof"
    INFO | jvm 3 | <DATE/TIME> | # Executing /bin/sh -c "gzip -f /image/core/uc_oom.hprof"...
  • A Upgrade Coordinator core dump is located on the NSX-T manager partition /image/core/:

    -rw------- 1 uuc uuc 112M <DATE/TIME> uc_oom.hprof.gz

  • There will be high count of Edge cluster and compute collection which can be verified by running the below API:

    GET https://<NSX_IP>/api/v1/edge-clusters

        "result_count": 46,<<<<<<<<<<<<

     

    GET https://<NSX_IP>api/v1/fabric/compute-collections


     "result_count": 2441,<<<<<<<<

Environment

VMware NSX-T 3.2.3, 4.0.x and 4.1.x and below.

Cause

In a scaled environment, during the NSX-T edge pre-check, the Upgrade Coordinator (UC) loads all the compute collections and matches it against the compute on which the edge is deployed. This workflow is executed in parallel processing to load all Edge clusters at the same time. This leads to the UC going out of memory and hence the edge pre-check fails. 

Resolution

This issue is resolved in VMware NSX 3.2.4.0 and 4.2.0, available at Broadcom downloads.

If you are having difficulty finding and downloading the software, please review the Download Broadcom products and software KB.

Workaround

As the pre checks are not running, we need to ensure there are no issues with the edges nodes, this can be done manually, by completing steps 1 to 5 below:

  1. For each edge node, get the real time status, by running the following API against each edge node and ensure the status is green:
        GET https://<mgrIp>/api/v1/transport-nodes/<nodeid>/status?source=realtime
    Check free space in "/tmp",  "/image", "/var/log" and root folder "/" to accommodate the edge upgrade nub file.
  2. Get all the edge transport node's state, check whether the state attribute is in success state "state": "success"
        GET https://<mgrIp>/api/v1/transport-nodes/<nodeid>/state
  3. For each edge cluster, get the real time status, by running the following API against each edge cluster and ensure the status is green:
        GET https://<mgrIp>/api/v1/edge-clusters/<edgeclusterId>/status?source=realtime
  4. Check from UI, if there are any open alarms either for edge transport nodes or for routing. Please resolve all alarms before proceeding with the upgrade.
    Rest API to check whether there are any open alarms on edge transport node:
    GET https://<mgrIp>/api/v1/alarms?status=OPEN&node_id=<nodeId>
  5. Ensure the data stores on which the edges are residing have sufficient free space to accommodate the upgrade.
    Free space must be number of edges in the edge cluster present in the data store multiples of the size taken by one edge. 

    Once the above steps 1 to 5 are complete, please proceed with the upgrade:
    1. RESET the edge upgrade plan (to handle the edge node ordering check):
        POST https://<mgrIp>/api/v1/upgrade/plan?action=reset&component_type=EDGE 
    2. Trigger upgrade using REST API, as Universal precheck is not mandatory:
        POST https://<mgrIp>/api/v1/upgrade/plan?component_type=EDGE&&action=start

 

After the edge nodes are upgraded using the workaround provided in the resolution section, the NSX Manager prechecks fail due to edge node prechecks being bypassed.

Due to the Edge prechecks being bypassed, the NSX Manager precheck also fails, as it depends on the completion of those Edge prechecks, which prevents the NSX Managers upgrade from starting.

 

Workaround for Manager prechecks failing:

We can manually run the precheck for NSX Manager Nodes using API and then trigger the upgrade using either API or NSX UI.

1.Run the NSX manager prechecks only, bypassing Edge and Host pre checks, using the API provided below:

POST https://<nsx-mgr-IP>/api/v1/upgrade?component_type=MP&&action=execute_pre_upgrade_checks

2. After you have ensured all checks are passed, you can then trigger the NSX manager upgrade from NSX UI or using the API provided below: 

POST https://<nsx-mgr-IP>/api/v1/upgrade/plan?action=upgrade&component_type=MP.