NSX upgrade pre-check fails with: "Upgrade Agent on Edge Node <tn-uuid> is unreachable. Restart the Upgrade agent service and check network connectivity"
search cancel

NSX upgrade pre-check fails with: "Upgrade Agent on Edge Node <tn-uuid> is unreachable. Restart the Upgrade agent service and check network connectivity"

book

Article ID: 381962

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • When running upgrade pre-checks prior to upgrading NSX, the following error occurs for Edge Transport Nodes:
    • "Upgrade Agent on Edge Node <tn-uuid> is unreachable. Restart the Upgrade agent service and check network connectivity."
  • All Edge Transport Nodes appear healthy. There are no active system alarms.
  • Communication from the Edge Transport Nodes to NSX Manager is working as expected.
  • The following error can be seen in the /var/log/upgrade-coordinator/upgrade-coordinator.log on the NSX Manager: 
    • RPC message is sent:
      20##-##-##T##:47:51.508Z  INFO pool-11-thread-14-rpc UpgradeMessagingServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Sending RPC to <transport-node-uuid>: [com.vmware.nsx.upgrade_agent.GetUpgradeStateMsg] bundle_url: "3.2.3.0.0.21703624/Edge/nub/VMware-NSX-edge-3.2.3.0.0.21703644.nub"
      
      RPC timeout:
      20##-##-##T##:48:01.497Z  INFO pool-11-thread-14 UpgradeAgentMessagingServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Sending command getUpgradeState  to node <transport-node-uuid> timedout.
      
      RPC no response, cancelling task:
      20##-##-##T##:48:01.497Z  INFO pool-11-thread-14 UpgradeAgentMessagingServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Didn't get a response; cancelling the RPC thread for getUpgradeState  on <transport-node-uuid>
      
      Unknown state / Unreachable:
      20##-##-##T##:48:01.497Z  INFO pool-11-thread-14 UpgradeAgentMessagingServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Got upgrade state UAGetStateResponse [upgradeState=UNKNOWN, toString()=UAResponse [responseHeader=ResponseHeader [commandState=CMD_IN_PROGRESS, info=Timeout in messaging layer]]] from <transport-node-uuid>
      20##-##-##T##:48:01.497Z ERROR pool-11-thread-14 UpgradeStateHelper - - [nsx@6876 comp="nsx-manager" errorCode="MP30089" level="ERROR" subcomp="upgrade-coordinator"] Got unexpected appliance upgrade state UAGetStateResponse [upgradeState=UNKNOWN, toString()=UAResponse [responseHeader=ResponseHeader [commandState=CMD_IN_PROGRESS, info=Timeout in messaging layer]]] from node <transport-node-uuid>
      20##-##-##T##:48:01.563Z  INFO pool-11-thread-14 InspectionTask - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] UU after UA status update: UpgradeUnit [id=<transport-node-uuid>, TransportNodeID =<transport-node-uuid>, name=<edge-name>, description=null, type=EDGE, upgradeUnitSubtype=RESOURCE, currentVersion=3.0.2.0.0.16887208, warnings=[Unknown upgrade state returned from unit with id <transport-node-uuid>], errors=null, metaData={IP Address=x.x.x.x, Compute Id=xxxx-xxxx-xxxx-xxxx-xxxx:domain-xxxx, Number of logical routers on edge node.=x}, rebooting=false, UaReportedStatus=UNKNOWN, extendedConfiguration=null, progressTracker=UpgradeUnitProgressCollectorImpl [reference=<transport-node-uuid>, referenceType=EDGE, getProgressPercentage()=0, getUpgradeStatus()=NOT_STARTED, getLastProgressMessage()=null, getParent()=ReferencedProgressCollectorImpl [reference=xxxx-xxxx-xxxx-xxxx-xxxx, getProgressPercentage()=0, getUpgradeStatus()=NOT_STARTED, getLastProgressMessage()=null]], class=class com.vmware.nsx.management.upgrade.model.UpgradeUnit, disabled=false, isSkipUpgrade=false, hashCode=xxxxxxx]
      20##-##-##T##:48:01.597Z  WARN pool-11-thread-14 UpgradeServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="upgrade-coordinator"] [PUC] Pre-upgrade check InspectionTaskInfo[id=<null>,name=Check status of upgrade agent,description=Check status of upgrade agent,componentType=EDGE] failed with result BasicInspectionTaskResult{status=FAILURE, taskInfo=InspectionTaskInfo[id=<null>,name=Check status of upgrade agent,description=Check status of upgrade agent,componentType=EDGE], failureMessages=null, failures=[{"moduleName":"upgrade-coordinator","errorCode":30249,"errorMessage":"Upgrade Agent on Edge node <transport-node-uuid> is unreachable. Restart the Upgrade agent service and check network connectivity."}]}

 

 

Environment

VMware NSX 3.0.x

VMware NSX 3.2.x

Cause

The RPC message fails to send due to a Upgrade-Coordinator CPU spike or a brief network interruption at the time of pre-check run.

Resolution

Rerun NSX upgrade pre-checks and confirm they are successful.