Edge Datapath CPU very high alarm
search cancel

Edge Datapath CPU very high alarm

book

Article ID: 330483

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Edge Datapath CPU usage very high
Event ID: edge_health.edge_datapath_cpu_very_high
Alarm Description

  • Purpose: Indicates Edge Datapath CPU usage is high
  • Impact: Rx drops will be observed when usage reaches 100%

Environment

VMware NSX-T Data Center

Edge Form factors:

  • Bare Metal Edge
  • VM Edge
 

Cause

Reason for very high CPU usage: 

  • Current CPU usage on the Edge node can be obtained by invoking the 'get dataplane cpu stats' Edge CLI which shows packets per second per CPU core and the CPU utilization. 100% CPU usage implies you have reached the maximum capacity for one or all CPUs.

    Sample output for get dataplane cpu stats:

  • One of the reasons is the traffic rate is at 100% of what the CPU can process.
  • CPU usage also increases when there is large number of fragmented packets. Checking for MTU size along the path and adjusting the packet size can help reduce fragmentation.
  • The number of fragmented packets on the Logical router interface can be obtaining using 'get gateway interface <Logical router interface UUID> stats' Edge CLI. Logical router interface UUID is obtained using 'get interface' Edge CLI under the Logical router VRF. 
  • CPU usage may be high only on a subset of CPUs if the traffic is getting hashed only to that subset of CPUs. 

    Sample output for get interface under a given VRF:



    Sample output for get gateway interface:

Resolution

Steps to Resolve
For 3.0.0 and higher

Recommended Action: 

  • Collect the support bundle when the alarm is raised.
  • Consider increasing the Edge appliance form factor size and rebalancing services on this Edge node to other Edge nodes in the same cluster or other Edge clusters.
  • Higher CPU usage is expected with higher packet rates. On the Edge node if the packet rate is low while cpu usage is high then check if flow-cache is disabled by invoking 'get dataplane flow-cache config' Edge CLI. If it is disabled, then consider re-enabling it using the command 'set dataplane flow-cache enabled' followed by 'restart service dataplane' (Note: This command will cause momentary disruption in traffic).

    Sample output for get dataplane flow-cache config:

Maintenance window required for remediation? Yes

Additional Information