NSX Edge Nodes Experience Network Performance Issues Caused by High CPU Ready Times
book
Article ID: 408235
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
NSX Edge nodes are experiencing network performance issues caused by high CPU Ready times. Traffic passing through the edge infrastructure may experience degraded performance and connectivity problems due to CPU scheduling delays on the Edge VMs.
Symptoms include:
Network performance degradation for traffic through NSX Edge nodes
Intermittent connectivity issues affecting applications and services
Slowness in network traffic processing
Timeouts and increased latency for network connections
Check CPU Ready times on ESXi hosts running NSX Edge nodes using esxtop:
SSH to the ESXi host where NSX Edge nodes are running
Launch esxtop by typing esxtop
Look for the NSX Edge VMs in the list
Check the %RDY column - values consistently above 8% indicate problematic CPU ready times
Monitor for sustained high RDY times during peak usage periods
Check for CPU reservation mismatches:
Verify Edge VM CPU reservation settings against available host MHz
Examine if hosts can honor the CPU reservations for Edge VMs
Environment
VMware NSX Edge nodes
VMware vSphere ESXi hosts
Production backend applications
Load balancers distributing traffic through NSX
Cause
The issue is caused by severe CPU contention on ESXi hosts running NSX Edge nodes. When CPU Ready times are consistently above 8%, the Edge VMs experience CPU scheduling delays that cause packet processing issues, resulting in dropped packets and network performance degradation.
Contributing factors include:
Host oversubscription with Edge nodes competing for CPU resources with other workloads
CPU reservation mismatches where Edge VMs cannot obtain their reserved CPU resources
Edge nodes sharing infrastructure with application workloads causing resource contention
Resolution
Step 1: Isolate NSX Edge Node on Dedicated Host
Move the NSX Edge node to run on a dedicated ESXi host by itself to eliminate CPU resource contention:
Identify an available ESXi host in the cluster or add a new host if needed
Use vMotion to migrate the NSX Edge VM to the dedicated host
Configure DRS anti-affinity rules to prevent other VMs from being placed on the Edge host
Verify the dedicated host has sufficient CPU resources to handle Edge VM reservations
Monitor CPU Ready times after migration
Step 2: If High CPU Ready Times Persist - Upgrade Edge Node Size
If CPU Ready times remain consistently above 8% after isolation:
Upgrade the NSX Edge node to a larger size with more vCPUs
Increase CPU reservation appropriately for the larger Edge configuration
Ensure the dedicated host has sufficient resources for the upgraded Edge specifications
Monitor performance after the upgrade
Step 3: If Issues Continue - Offload Edge Node Processing
If CPU Ready times are still problematic after isolation and upgrade:
Review Edge node configuration and services running on the edge
Consider distributing edge services across multiple Edge nodes
Evaluate if certain services can be moved to other network infrastructure
Implement load balancing between multiple Edge nodes if applicable
Expected Results:
CPU Ready times should drop to below 5% consistently
Improved network performance for traffic through NSX Edge nodes
Stable connectivity for applications utilizing Edge services
Elimination of timeouts and latency issues
Verification: After implementing the resolution steps:
Monitor CPU Ready times through esxtop (should be consistently below 5%)
Test network connectivity and performance through the Edge nodes
Verify application performance and connectivity stability
Confirm elimination of network timeouts and latency issues
If the error persists after following these steps, contact Broadcom Support for further assistance.