Edge node is reporting as UNKNOWN state in the NSX Manager
search cancel

Edge node is reporting as UNKNOWN state in the NSX Manager

book

Article ID: 369296

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Edge node is reporting as UNKNOWN state in the NSX Manager 
  • Issue is seen in NSX-T version 3.1.2.0
  • GET https://<NSX-T_manager-IP>/api/v1/transport-nodes/<tn-id>/status reported the edge node as UNKNOWN
  • Upgrade pre-check fails as the edge node is in UNKNOWN state
  • /var/log/upgrade-coordinator/upgrade-coordinator.log:

2024-05-28T03:37:55.878Z WARN http-nio-127.0.0.1-7442-exec-5 EdgeUuUtilsServiceImpl 18005 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="upgrade-coordinator"] Detect issues with Edge upgrade unit fabricId eba35663-####-#####-####-0ee94b87aeba TransportNodeId eba35663-####-#####-####-0ee94b87aeba: [Pnic status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN., Overall status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN., Tunnel status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN.]

  • /var/log/proton/nsxapi.log on the nsx manager node reports ConcurrentUpdateException 

2024-05-28T05:59:08.095Z WARN gsr-summation-cache-committer-1 ObjectsView 32276 TXEnd[TX[7e74]] Aborted Exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=16, sequence=1162255862) | Failed Transaction ID = c59dc1b6-####-#####-####-b10cb567e74 | Offending Address = 1162255877 | Conflict Key = A7F0C534358D5655 | Conflict Stream = 8d4e54c2-####-#####-####-c04be92c4c52 | Cause = CONFLICT | Time = 579 ms
2024-05-28T05:59:08.096Z WARN gsr-summation-cache-committer-1 CorfuDbTransactionManager 32276 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Received TransactionAbortedException from the Corfu client.
2024-05-28T05:59:08.107Z WARN gsr-summation-cache-committer-1 CorfuDbTransactionManager 32276 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] com.vmware.nsx.management.container.exceptions.ConcurrentUpdateException: STREAM_ID = 8d4e54c2-####-#####-####-c04be92c4c52 | CONFLICT_VALUE = GenericStatsRecords [prefix=null, counterValues=[315042970, 108862342068, 8477336, 8406909], gcClassName=null, gcMethodName=null, createdTime=1704733060090, lastUpdateTime=1716875753032] | CONFLICT_KEY_HASH = -6345355046937602475 | CONFLICT_KEY = SummationGenericStatsRecords2/DFWRuleStats?0?D?1135 | MAP_NAME = nsx-manager SummationGenericStatsRecords2 8b7e | TRANSACTION_ID = c59dc1b6-####-#####-####-b10cb567e74 | OFFENDING_ADDRESS = 1162255877
2024-05-28T05:59:08.107Z ERROR gsr-summation-cache-committer-1 TransactionHelper 32276 - [nsx@6876 comp="nsx-manager" errorCode="MP6408" level="ERROR" subcomp="manager"] Commit failed
2024-05-28T05:59:08.224Z ERROR aggprocessor-wait-for-collection-timer AggregationProcessorImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" errorCode="MP6405" level="ERROR" subcomp="manager"] gsrSummationCache gsr-summation-cache commit result is not normal, result = CommitResult [dataCommitErrors=123, workerCommitErrors=0]
2024-05-28T05:59:08.224Z ERROR aggprocessor-wait-for-collection-timer AggregationProcessorImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" errorCode="MP6401" level="ERROR" subcomp="manager"] Summation statistics cache commit encountered error

  • /var/log/proton/nsxapi.log logs reporting the update transport node status to unknown due to timeout for node

2024-05-28T06:00:01.667Z INFO http-nio-127.0.0.1-7440-exec-46 TransportNodeLcmFacadeImpl 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="2920769e-####-#####-####-ac02d1086a22" subcomp="manager" username="nsx_policy"] TransportNodeFacade : getTransportNode(..) for id [eba35663-####-#####-####0ee94b87aeba]
2024-05-28T06:00:01.702Z INFO http-nio-127.0.0.1-7440-exec-33 HeatMapServiceImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Updated Tunnel connection status for TransportNode eba35663-####-#####-####0ee94b87aeba
2024-05-28T06:00:01.719Z INFO http-nio-127.0.0.1-7440-exec-33 EdgeNodeInstallInfo 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Node EdgeNodeInstallInfo/eba35663-####-#####-####-0ee94b87aeba State: NODE_READY TN Config State: TRANSPORT_NODE_SYNC_PENDING
2024-05-28T06:00:01.719Z INFO http-nio-127.0.0.1-7440-exec-33 EdgeNodeInstallInfo 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Node EdgeNodeInstallInfo/eba35663-####-#####-####-0ee94b87aeba State: NODE_READY TN Config State: TRANSPORT_NODE_SYNC_PENDING
2024-05-28T06:01:04.734Z INFO HeatMap-ConnCheck-Thread HeatmapConnCheckService 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] node eba35663-####-#####-####-0ee94b87aeba ccp update timeout, time stamp: current 1716876064733, ccp 1716874460829, interval 360000 in milliseconds
2024-05-28T06:01:04.763Z INFO HeatMap-ConnCheck-Thread HeatmapConnCheckService 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] update node status to unknown due to timeout for node eba35663-####-#####-####0ee94b87aeba
(END)

Environment

NSX-T version 3.1.2.0

Cause

Update action was aborted for ConcurrentUpdateException, and ConcurrentUpdateException was converted to TransactionAbortedException. 

Resolution

This is a known issue and the engineering team is aware of it. The issue is resolved in the future release.

As a workaround, you can restart the proton service on the manager nodes.

/etc/init.d/proton status    #check the status of the service
/etc/init.d/proton restart   #restart the proton service

Note: Make sure the proton service is up and the cluster is stable before restarting the service on the second and third nodes.