NSX-T Bare Metal Edge Node CPU core 0 or 1 dropping traffic
search cancel

NSX-T Bare Metal Edge Node CPU core 0 or 1 dropping traffic

book

Article ID: 312599

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • On the NSX-T Bare Metal Edge Node, run the admin CLI command: get datapath cpu stats shows high CPU utilization on core 0 or 1.
  • In the NSX-T Bare Metal Edge Node logs var/run/vmware/edge/cpu_usage.json , the following is observed: 
    "highest_cpu_core_usage_dpdk": 77.88,
    "dpdk_cpu_per_core": {
        "0": 77.88, > This is high cpu core usage
        "1": 0.03,
        "2": 0.01,
        "3": 0.04,
        "4": 0.05,
        "5": 0.01,
        "6": 0.01,
        "7": 0.01,
        "8": 0.01,
        "9": 0.02,
        "10": 0.01,
        "11": 0.02
    },
  • Packet drops are seen for traffic processed by CPU core 0 or 1 while other CPU cores handle a similar amount of traffic without any drop.
  • Other services running on this affected CPU core (above core 0) may also be impacted, for example BGP or LACP traffic.

Environment

VMware NSX-T Data Center
VMware NSX

Cause

There is a kni_single kernel thread, in charge of communication between the userspace and kernel space.
This runs on a datapath fastpath core (i.e. on core 0 or 1), if some workload network traffic gets hashed to CPU core 0 or 1, it may get dropped due to the high CPU utilization.

Resolution

This is a known issue and resolved in NSX-T 3.2.3 / NSX 4.1.1 or higher.

Workaround:
The workaround is to move the kni_single kernel thread to a non datapath CPU core.

SSH as root to the NSX-T Bare Metal Edge Node.

1. List the available CPU cores (to be ran as root user):


root@Edge-1:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7 <<<<<<
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
Stepping: 4
CPU MHz: 1995.312
BogoMIPS: 3990.62
Virtualization: VT-x
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
NUMA node0 CPU(s): 0-7

2. Check which of the CPU cores are used by the dataplane, log in as admin user:


Edge-1> get dataplane
Accept_ra : False
Bfd_ring_size : 512
Bitw_mode : False
Corelist : 0,1,2,3,4,5 <<<<< cores used for dataplane

In this example cores 6 and 7 are not used for the dataplane.

3. Get the PID of the kni_single kernel thread, login as root:


root@edge02:~# ps -aux|grep -i [k]ni_single
root      7128  4.1  0.0      0     0 ?        S    Aug12 243:18 [kni_single]

4. Use the "taskset" command to list the current CPU affinity for the kni_single process:


root@edge02:~# taskset -pc 7128
7128's current affinity list: 0-3

Note: The above PID 7128 will be different.

5. Use the taskset command to set the cpu affinity of kni_single to non datapath cores: 


root@edge02:~# taskset -pc 6-7 7128
7128's current affinity list: 0-3
7128's new affinity list: 6,7

Note: The above PID 7128 will be different and the core numbers may be different.

6. Verify that the change was made:


root@edge02:~# taskset -pc 7128
7128's current affinity list: 6,7

Note:
The above command outputs and values are only examples, these may vary depending on the environment.
This workaround will not persist across reboots. Please contact Broadcom Support and note this Article ID (312599) in the problem description for any issues or assistance required for this workaround.