UNKNOWN
state in NSX Manager GET https://<NSX-T_manager-IP>/api/v1/transport-nodes/<tn-id>/status
reports the edge node as UNKNOWNcurl -X GET -k -u admin:'############' "https://nsx-manager-ip/api/v1/transport-nodes/<transport-node-id>/status"
UNKNOWN
state/var/log/upgrade-coordinator/upgrade-coordinator.log:
get service install-upgrade
. Note down the IP of the orchestrator node2024-05-28T03:37:55.878Z WARN http-nio-127.0.0.1-7442-exec-5 EdgeUuUtilsServiceImpl 18005 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="upgrade-coordinator"] Detect issues with Edge upgrade unit fabricId eba35663-####-#####-####-0ee94b87aeba TransportNodeId eba35663-####-#####-####-0ee94b87aeba: [Pnic status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN., Overall status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN., Tunnel status of the edge transport node eba35663-####-#####-####-0ee94b87aeba is UNKNOWN.]
/var/log/proton/nsxapi.log
on the NSX manager node reports ConcurrentUpdateException
2024-05-28T05:59:08.095Z WARN gsr-summation-cache-committer-1 ObjectsView 32276 TXEnd[TX[7e74]] Aborted Exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=16, sequence=1162255862) | Failed Transaction ID = c59dc1b6-####-#####-####-b10cb567e74 | Offending Address = 1162255877 | Conflict Key = A7F0C534358D5655 | Conflict Stream = 8d4e54c2-####-#####-####-c04be92c4c52 | Cause = CONFLICT | Time = 579 ms
2024-05-28T05:59:08.096Z WARN gsr-summation-cache-committer-1 CorfuDbTransactionManager 32276 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Received TransactionAbortedException from the Corfu client.
2024-05-28T05:59:08.107Z WARN gsr-summation-cache-committer-1 CorfuDbTransactionManager 32276 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] com.vmware.nsx.management.container.exceptions.ConcurrentUpdateException: STREAM_ID = 8d4e54c2-####-#####-####-c04be92c4c52 | CONFLICT_VALUE = GenericStatsRecords [prefix=null, counterValues=[315042970, 108862342068, 8477336, 8406909], gcClassName=null, gcMethodName=null, createdTime=1704733060090, lastUpdateTime=1716875753032] | CONFLICT_KEY_HASH = -6345355046937602475 | CONFLICT_KEY = SummationGenericStatsRecords2/DFWRuleStats?0?D?1135 | MAP_NAME = nsx-manager SummationGenericStatsRecords2 8b7e | TRANSACTION_ID = c59dc1b6-####-#####-####-b10cb567e74 | OFFENDING_ADDRESS = 1162255877
2024-05-28T05:59:08.107Z ERROR gsr-summation-cache-committer-1 TransactionHelper 32276 - [nsx@6876 comp="nsx-manager" errorCode="MP6408" level="ERROR" subcomp="manager"] Commit failed
2024-05-28T05:59:08.224Z ERROR aggprocessor-wait-for-collection-timer AggregationProcessorImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" errorCode="MP6405" level="ERROR" subcomp="manager"] gsrSummationCache gsr-summation-cache commit result is not normal, result = CommitResult [dataCommitErrors=123, workerCommitErrors=0]
2024-05-28T05:59:08.224Z ERROR aggprocessor-wait-for-collection-timer AggregationProcessorImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" errorCode="MP6401" level="ERROR" subcomp="manager"] Summation statistics cache commit encountered error
/var/log/proton/nsxapi.log
reports the update transport node status to unknown due to timeout on the node.2024-05-28T06:00:01.667Z INFO http-nio-127.0.0.1-7440-exec-46 TransportNodeLcmFacadeImpl 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="2920769e-####-#####-####-ac02d1086a22" subcomp="manager" username="nsx_policy"] TransportNodeFacade : getTransportNode(..) for id [eba35663-####-#####-####0ee94b87aeba]
2024-05-28T06:00:01.702Z INFO http-nio-127.0.0.1-7440-exec-33 HeatMapServiceImpl 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Updated Tunnel connection status for TransportNode eba35663-####-#####-####0ee94b87aeba
2024-05-28T06:00:01.719Z INFO http-nio-127.0.0.1-7440-exec-33 EdgeNodeInstallInfo 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Node EdgeNodeInstallInfo/eba35663-####-#####-####-0ee94b87aeba State: NODE_READY TN Config State: TRANSPORT_NODE_SYNC_PENDING
2024-05-28T06:00:01.719Z INFO http-nio-127.0.0.1-7440-exec-33 EdgeNodeInstallInfo 32276 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="180dc84b-####-#####-####-41e8d3fe5629" subcomp="manager" username="nsx_policy"] Node EdgeNodeInstallInfo/eba35663-####-#####-####-0ee94b87aeba State: NODE_READY TN Config State: TRANSPORT_NODE_SYNC_PENDING
2024-05-28T06:01:04.734Z INFO HeatMap-ConnCheck-Thread HeatmapConnCheckService 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] node eba35663-####-#####-####-0ee94b87aeba ccp update timeout, time stamp: current 1716876064733, ccp 1716874460829, interval 360000 in milliseconds
2024-05-28T06:01:04.763Z INFO HeatMap-ConnCheck-Thread HeatmapConnCheckService 32276 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] update node status to unknown due to timeout for node eba35663-####-#####-####0ee94b87aeba
(END)
NSX-T version 3.1.2.0
Update action was aborted for ConcurrentUpdateException
, and ConcurrentUpdateException
was converted to TransactionAbortedException
.
This is a known issue and the engineering team is aware of it. The issue is resolved in the future release NSX 3.2.x or higher
As a workaround, you can restart the proton service on the manager nodes.
/etc/init.d/proton status #check the status of the service
/etc/init.d/proton restart #restart the proton service
Note: Make sure the proton service is up and the cluster is stable before restarting the service on the second and third nodes.
To ensure the NSX cluster stability, login to any of the NSX manager as admin and run the command get cluster status | find Status