In VMware NSX for vSphere 6.x, the NSX Edge experiences high CPU utilization and/or fails to accept configuration changes
searchcancel
In VMware NSX for vSphere 6.x, the NSX Edge experiences high CPU utilization and/or fails to accept configuration changes
book
Article ID: 345647
calendar_today
Updated On: 02-03-2025
Products
VMware NSX for vSphere
Issue/Introduction
Symptoms:
NSX Edge experiences high CPU utilization
NSX Edge fails to accept configuration changes
Running the show log command on the NSX Manager console reports entries similar to:
2015-10-15 09:31:32.473 UTC ERROR TaskFrameworkExecutor-1 PublishUtils:92 - Timeout happened during execution of jobId 'jobdata-173581' for edgeId 'edge-##', startTime '1444899795922' currentTime '1444901492473': doingRollback 'false' 2015-10-15 09:31:32.473 UTC ERROR TaskFrameworkExecutor-1 PublishTask:346 - Failed jobId 'jobdata-173581' for edge 'edge-18' during publishing. com.vmware.vshield.edge.exception.VshieldEdgeException: vShield Edge:10163:Publish Job jobdata-173581 for NSX Edge edge-## timed out. It has already taken 28 minutes, hence was aborted and rollback has been performed.
Running the show log command on the NSX Manager console reports entries similar to:
2015-10-15 09:21:32.453 UTC INFO TaskFrameworkExecutor-1 AbstractEdgeApplianceManager:643 - The vse command is being sent to 'vm-#####' over msgBus 2015-10-15 09:31:32.453 UTC INFO messagingTaskExecutor-10 QueueSubscriptionManager:252 - Purging queue 'vse_5031887e-####-####-####-e9e1c2bf6b94_request_queue'. No wait = 'true'. 2015-10-15 09:31:32.457 UTC INFO messagingTaskExecutor-10 VirtualMachineVcOperationsImpl:54 - Retrieving power-state for VM '######' 2015-10-15 09:31:32.462 UTC INFO messagingTaskExecutor-10 VirtualMachineVcOperationsImpl:57 - Power-state for VM '#####' = 'poweredOn' 2015-10-15 09:31:32.462 UTC INFO messagingTaskExecutor-10 EdgeUtils:302 - SysEvent-Detailed-Message :(Kept only in logs) :: Rpc request to vm: vm-##### timed out 2015-10-15 09:31:32.466 UTC INFO messagingTaskExecutor-10 SystemEventDaoImpl:128 - [SystemEvent] Time:'Thu Oct 15 09:31:32.462 UTC 2015', Severity:'Major', Event Source:'vm-31231', Code:'30014', Event Message:'Failed to communicate with the NSX Edge VM.', Module:'NSX Edge Commnication Agent
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware NSX for vSphere 6.3.x VMware NSX for vSphere 6.1.x VMware NSX for vSphere 6.2.x
Cause
This issue occurs when an Edge virtual machine fails to initialize after being redeployed.
In addition, RPC timeout messages may be seen when the NSX Manager and Edge cannot communicate. Such communication occurs through the VIX channel if the Edge resides on a vSphere ESXi host which has not been prepared for NSX. If the ESXi host has been prepared, the communication occurs through the message bus channel.
Resolution
Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document, to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.
Check message bus status using the API https://<NSX_Manager_IP>/api/2.0/nwfabric/status?resource=MOID_OF_CLUSTER.
Note: A working message bus will return a green status.
Run this command in the NSX Edge to determine id the message bus is enabled and Rabbit MQ channels are listening:
The vmci_closed_by_peer counter records the number of times that the connection has been closed by the host agent. An incrementing value and vmci_conn: down status indicate that the host agent cannot connect to the RMQ broker. To validate this step further, run the show log follow command and search for messages similar to VmciProxy: [daemon.debug] VMCI Socket is closed by peer.
To check the health of the connections from the host side, use the esxcli network ip connection list | grep 5671 command.
~ # esxcli network ip connection list | grep 5671 tcp 0 0 #.#.#.#:43329 #.#.#.#:5671 ESTABLISHED 35854 newreno vsfwd tcp 0 0 #.#.#.#:52667 #.#.#.#:5671 ESTABLISHED 35854 newreno vsfwd tcp 0 0 #.#.#.#:20808 #.#.#.#:5671 ESTABLISHED 35847 newreno vsfwd tcp 0 0 #.#.#.#:12486 #.#.#.#:5671 ESTABLISHED 35847 newreno vsfwd
If the output fails to show as ESTABLISHED , collect the /var/log/vsfwd.log file and open a support request.
Note: NSX for vSphere release 6.1.5 resolves known issues with publishing time out issues by aggregating publishing jobs to enhance performance.