NSX Edge Nodes Experience Network Performance Issues Caused by High CPU Ready Times
search cancel

NSX Edge Nodes Experience Network Performance Issues Caused by High CPU Ready Times

book

Article ID: 408235

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX Edge nodes are experiencing network performance issues caused by high CPU Ready times. Traffic passing through the edge infrastructure may experience degraded performance and connectivity problems due to CPU scheduling delays on the Edge VMs.

Symptoms include:

  • Network performance degradation for traffic through NSX Edge nodes
  • Intermittent connectivity issues affecting applications and services
  • Slowness in network traffic processing
  • Timeouts and increased latency for network connections
  • Packet processing delays impacting network throughput

Steps to validate:

Check CPU Ready times on ESXi hosts running NSX Edge nodes using esxtop:

  1. SSH to the ESXi host where NSX Edge nodes are running
  2. Launch esxtop by typing esxtop
  3. Look for the NSX Edge VMs in the list
  4. Check the %RDY column - values consistently above 8% indicate problematic CPU ready times
  5. Monitor for sustained high RDY times during peak usage periods

Check for CPU reservation mismatches:

  • Verify Edge VM CPU reservation settings against available host MHz
  • Examine if hosts can honor the CPU reservations for Edge VMs

Environment

  • VMware NSX Edge nodes
  • VMware vSphere ESXi hosts
  • Production backend applications
  • Load balancers distributing traffic through NSX

Cause

The issue is caused by severe CPU contention on ESXi hosts running NSX Edge nodes. When CPU Ready times are consistently above 8%, the Edge VMs experience CPU scheduling delays that cause packet processing issues, resulting in dropped packets and network performance degradation.

Contributing factors include:

  • Host oversubscription with Edge nodes competing for CPU resources with other workloads
  • CPU reservation mismatches where Edge VMs cannot obtain their reserved CPU resources
  • Edge nodes sharing infrastructure with application workloads causing resource contention

Resolution

Step 1: Isolate NSX Edge Node on Dedicated Host

Move the NSX Edge node to run on a dedicated ESXi host by itself to eliminate CPU resource contention:

  1. Identify an available ESXi host in the cluster or add a new host if needed
  2. Use vMotion to migrate the NSX Edge VM to the dedicated host
  3. Configure DRS anti-affinity rules to prevent other VMs from being placed on the Edge host
  4. Verify the dedicated host has sufficient CPU resources to handle Edge VM reservations
  5. Monitor CPU Ready times after migration

Step 2: If High CPU Ready Times Persist - Upgrade Edge Node Size

If CPU Ready times remain consistently above 8% after isolation:

  1. Upgrade the NSX Edge node to a larger size with more vCPUs
  2. Increase CPU reservation appropriately for the larger Edge configuration
  3. Ensure the dedicated host has sufficient resources for the upgraded Edge specifications
  4. Monitor performance after the upgrade

Step 3: If Issues Continue - Offload Edge Node Processing

If CPU Ready times are still problematic after isolation and upgrade:

  1. Review Edge node configuration and services running on the edge
  2. Consider distributing edge services across multiple Edge nodes
  3. Evaluate if certain services can be moved to other network infrastructure
  4. Implement load balancing between multiple Edge nodes if applicable

Expected Results:

  • CPU Ready times should drop to below 5% consistently
  • Improved network performance for traffic through NSX Edge nodes
  • Stable connectivity for applications utilizing Edge services
  • Elimination of timeouts and latency issues

Verification: After implementing the resolution steps:

  • Monitor CPU Ready times through esxtop (should be consistently below 5%)
  • Test network connectivity and performance through the Edge nodes
  • Verify application performance and connectivity stability
  • Confirm elimination of network timeouts and latency issues

If the error persists after following these steps, contact Broadcom Support for further assistance.