Newly deployed edge nodes via NSX API are stuck in Configuration State 'node_not_ready' state.
The existing edges CPU usages become high.
The Edge VMs are deployed successfully and able to connect to the NSX Manager on port 1234 but not CCP on port 1235. The Configuration State shows up as "Node Not Ready", Manager Connectivity as "Up", Controller Connectivity as "Down" in the NSX UI.
The manager service is connected successfully whereas the CCP service gives "OTHER_ERROR".
root@NSX_EDGE:~# su admin -c "get managers"
Mon Dec 02 2024 UTC 08:06:44.023
- 10.##.##.01 Connected (NSX-RPC)
- 10.##.##.02 Connected (NSX-RPC) *
- 10.##.##.03 Connected (NSX-RPC)
root@NSX_EDGE:~# su admin -c "get controllers"
Mon Dec 02 2024 UTC 08:06:56.292
Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason
10.##.##.01 1235 enabled not used false null NA NA
10.##.##.02 1235 enabled disconnected true down NA OTHER_ERROR <=========== CCP is not UP
10.##.##.03 1235 enabled not used false null NA NA
GET API for node state shows node state as "NODE_NOT_READY" and failure message as "Waiting for edge node to be ready."
GET https://10.##.##.01/api/v1/transport-nodes/########-4107-####-bb04-############/state
{
"transport_node_id": "########-4107-####-bb04-############",
"maintenance_mode_state": "DISABLED",
"node_deployment_state": {
"state": "NODE_NOT_READY",
"failure_message": "",
"failure_code": -1
},
"hardware_version": "vmx-##",
"state": "pending",
"details": [
{
"sub_system_id": "########-4107-####-bb04-############",
"sub_system_type": "Host",
"state": "pending",
"failure_message": "Waiting for edge node to be ready."
}
]
}
To check high count of EdgeConfigUpdateMsg or EdgeSystemInfoMsg sent by the Edge Nodes to MP:
Minute wise count of EdgeConfigUpdateMsg or EdgeSystemInfoMsg sent from all edge nodes:
/var/log/proton$ grep "Receive EdgeConfigUpdateMsg" nsxapi* | grep "2024-12-01T09:21" | wc -l
173
/var/log/proton$ grep "Receive EdgeSystemInfoMsg" nsxapi* | grep "2025-08-04T04:36" | wc -l
380
This tells approximately 173 EdgeConfigUpdateMsg / 380 EdgeSystemInfoMsg are sent per minute by all edge nodes.
Per edge node per minute EdgeConfigUpdateMsg or EdgeSystemInfoMsg count:
/var/log/proton$ grep "EdgeConfigUpdateMsg for fabric edge node" nsxapi* | grep "2024-12-01T09:21" | grep ########-b435-####-a299-############
nsxapi.2.log:2024-12-01T09:21:15.352Z INFO EdgeTNRpcRequestRouter2 EdgeTNConfigUpdateRequestHandler 77172 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Received message: EdgeConfigUpdateMsg for fabric edge node: ########-b435-####-a299-############
nsxapi.2.log:2024-12-01T09:21:35.972Z INFO EdgeTNRpcRequestRouter5 EdgeTNConfigUpdateRequestHandler 77172 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Received message: EdgeConfigUpdateMsg for fabric edge node: ########-b435-####-a299-############
nsxapi.2.log:2024-12-01T09:21:59.148Z INFO EdgeTNRpcRequestRouter4 EdgeTNConfigUpdateRequestHandler 77172 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Received message: EdgeConfigUpdateMsg for fabric edge node: ########-b435-####-a299-############
This tells approximately 3 EdgeConfigUpdateMsg / 8 EdgeSystemInfoMsg are sent by an edge node, even though there is no config changes on edge node.
Queue Saturation Evidence:
grep "Failed to handle message due to RejectedExecutionException Task" nsxapi.*
Example output showing ThreadPoolExecutor at maximum capacity:
nsxapi.1.log:2025-08-18T20:28:55.767Z ERROR nsx-rpc:RPC_PROXY_CONN_PROVIDER:user-executor-9 InboundMessageRouter 1069292 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP4002" level="ERROR" subcomp="manager"] Failed to handle message due to RejectedExecutionException Task com.vmware.nsx.messaging.service.impl.InboundMessageRouter$HandlerExecutor$$Lambda$2463/0x000075771ce7a040@67c18fc7 rejected from java.util.concurrent.ThreadPoolExecutor@2021deef[Running, pool size = 5, active threads = 5, queued tasks = 1000, completed tasks = 2057669]. <session-id>:<message-id> = 'null:null', clientType = 'cvn-edge', application = 'EdgeVertical'
AppInitHandshake timeouts seen on newly created Edge:
/var/log$ grep "AppInitHandshake timed-out" syslog
2025-07-30T06:47:43.614Z edge_node NSX 3239 - [nsx@6876 comp="nsx-edge" subcomp="mpa-client" tid="3513" level="INFO"] [EdgeVertical] AppInitHandshake timed-out. Seq (1)
2025-07-30T06:48:13.614Z edge_node NSX 3239 - [nsx@6876 comp="nsx-edge" subcomp="mpa-client" tid="3515" level="INFO"] [EdgeVertical] AppInitHandshake timed-out. Seq (2)
2025-07-30T06:48:43.615Z edge_node NSX 3239 - [nsx@6876 comp="nsx-edge" subcomp="mpa-client" tid="3516" level="INFO"] [EdgeVertical] AppInitHandshake timed-out. Seq (3)
VMware NSX
Edge is sending high count of EdgeConfigUpdateMsgs or EdgeSystemInfoMsgs at short interval of 5 seconds. If reply is not received from MP, then Edge re-sends EdgeConfigUpdateMsgs / EdgeSystemInfoMsgs again.
EdgeConfigUpdateMsgs or EdgeSystemInfoMsg count is high on manager nodes, leading to manager overload. New Edge node AppInitMsgs are not replied, resulting in Edge nodes stuck in a node_not_ready state.
On a scale setup, Edge MP gets overloaded with a lot of EdgeConfigUpdateMsgs or EdgeSystemInfoMsg from Edge (every 5 seconds) which it could not ACK on time. Because of this, incoming AppInitMsgs from new edge nodes are not replied by manager. AppInitHandshake timed-outs are seen on newly created Edge.
The EdgeConfigUpdateMsg and EdgeSystemInfoMsg issues are resolved in VMware NSX 4.2.3 available at Broadcom Downloads. If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Option 1: Manager Reboot
Option 2: Ops Agent Service Management (Recommended)
# stop service nsx-opsagent
# start service nsx-opsagent
Implementation Steps:
stop service nsx-opsagentstart service nsx-opsagentImportant: After completing your edge node deployments, restart the nsx-opsagent service on the standby edges where it was stopped to restore normal operations and monitoring capabilities.
Monitoring: Use the provided grep commands to monitor EdgeConfigUpdateMsg and EdgeSystemInfoMsg volumes before and after implementing the workaround to verify message reduction.
For additional workaround, please also refer NSX Manager cluster intermittently goes into degraded state and NSX UI becomes inaccessible with error code 101