How to Troubleshoot and Fix Packet Loss Related to high CPU %RDY on HCX Network Extensions
search cancel

How to Troubleshoot and Fix Packet Loss Related to high CPU %RDY on HCX Network Extensions

book

Article ID: 382170

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

HCX Network Extension appliances may experience packet loss and high CPU ready times when handling high-density VLANs.

Symptoms include

  • Consistent packet loss on extended network segments
  • High CPU ready times (above 10%) on Network Extension appliances
  • Application connectivity issues between on-premises and cloud environments
  • Latency and occasional timeouts in network responses

Verifying High CPU Ready Times

To verify high CPU ready times using esxtop

  1. SSH to the ESXi host where the Network Extension appliance is running
  2. Launch esxtop by typing esxtop
  3. Look for the Network Extension VM in the list

Check the %RDY column - values consistently above 10% indicate high CPU ready times

Environment

  • VMware HCX Network Extension appliances
  • High-density VLAN environments
  • Network Extension appliances

Cause

Two primary factors tend to contribute to packet loss and high CPU ready times on HCX network extensions

  1. CPU Thread Limitation: The default network adapter context setting (ctxPerDev=1) limits the number of CPU threads that can simultaneously process network traffic for extended network segments. In high-density environments, this can lead to processing bottlenecks, resulting in packet loss and elevated CPU ready times.
  2. Network Load: High volumes of traffic crossing the Network Extension, particularly in environments with
    • Large numbers of VMs communicating across the extension
    • Applications causing frequent cross-site traffic
    • Network-intensive workloads spanning both sites
    • Sub-optimal placement of interdependent VMs on different sides of the extension

Resolution

These factors can work independently or compound each other, leading to performance degradation. The solution may require both optimizing CPU thread allocation and reducing network load through strategic workload placement or enabling features like MON.

Before making any configuration changes

  1. Contact Broadcom Support for HCX to confirm the high CPU ready and packet loss issues - link to this article
  2. If possible, first attempt to
    • Extend separate VLANs to distribute the load
    • Move VMs to the local side of the network to reduce traffic
    • Enable Mobility Optimized Networking (MON)

If unable to make the above changes or the above steps don't resolve the issues, since the packet loss and high CPU ready times can be caused by limited CPU thread allocation for network processing, increasing the number of CPU threads available through the ctxPerDev setting will allow for better distribution of the network processing load and improved performance.

Modify the network adapter context settings for the Network Extension appliances using the following procedure

  1. In the vSphere Client, locate the Network Extension appliance VM
  2. Shut down the Network Extension appliance VM
  3. Right-click the VM and select "Edit Settings"
  4. Confirm which network adapter is used for the extended network segment in the VM's network adapter settings
  5. Then click "VM Options" tab
  6. Expand "Advanced" section
  7. Click "Edit Configuration" next to "Configuration Parameters"
  8. In the Name column's search field, type "ctxPerDev" to filter the parameters
  9. Locate the ethernetX.ctxPerDev entry corresponding to your extended network adapter
  10. Change only the value from 1 to 3 for the specific ethernet adapter used for the extended network
  11. Click "OK" to save the configuration
  12. Power on the Network Extension appliance VM
  13. Repeat steps 1-12 for other Network Extension appliances on both sides of the network extension

This modification optimizes CPU thread utilization for network processing. The setting of ctxPerDev=3 is recommended for high-density environments. Continue monitoring performance during peak usage periods to validate the change.

If issues persist after the ctxPerDev changes, revisit the options of extending separate VLANs, moving VMs to the local side, or enabling MON to further reduce network load.

WARNING: Modifying ctxPerDev values for ethernet adapters not used for extended networks can lead to unnecessary resource consumption and potential performance degradation. Only modify the specific ethernet adapter used for the extended network segment.

Note: The Network Extension appliances should be modified during a maintenance window to allow for downtime.

Expected Results

After implementing these changes

  • Packet loss should be significantly reduced or eliminated
  • CPU Ready times should stabilize between 5-10%

  • Network performance should improve for extended network segments

Verification

Monitor the following metrics after implementation

  1. CPU Ready times through esxtop
  2. Packet loss through ping tests or network monitoring tools
  3. Application connectivity and performance across extended segments