NSX-T Bare Metal Edge Node CPU core 0 or 1 dropping traffic
search cancel

NSX-T Bare Metal Edge Node CPU core 0 or 1 dropping traffic

book

Article ID: 312599

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • The NSX-T Bare Metal Edge node is on version 3.1.x or lower.
  • On the NSX-T Bare Metal Edge Node, when you run the admin CLI command: get datapath cpu stats shows high CPU utilization on core 0 or 1.
  • In the NSX-T Bare Metal Edge Node logs var/run/vmware/edge/cpu_usage.json we see the following: 
    "highest_cpu_core_usage_dpdk": 77.88,
    "dpdk_cpu_per_core": {
        "0": 77.88, > This is high cpu core usage
        "1": 0.03,
        "2": 0.01,
        "3": 0.04,
        "4": 0.05,
        "5": 0.01,
        "6": 0.01,
        "7": 0.01,
        "8": 0.01,
        "9": 0.02,
        "10": 0.01,
        "11": 0.02
    },
  • Packet drops are seen for traffic processed by CPU core 0 or 1 while other CPU cores handle a similar amount of traffic without any drop.
  • Other services running on this affected CPU core (above core 0) may also be impacted, for example BGP or LACP traffic.

Environment

VMware NSX-T Data Center

Cause

There is a kni_single kernel thread, in charge of communication between the userspace and kernel space.
This runs on a datapath fastpath core (i.e. on core 0 or 1), if some workload network traffic gets hashed to CPU core 0 or 1, it may get dropped due to the high CPU utilization.

Resolution

This is a known issue affecting NSX-T Data Centre.

Workaround:
The workaround is to move the kni_single kernel thread to a non datapath CPU core.
SSH as root to the NSX-T Bare Metal Edge Node.

1. List the available CPU cores (to be ran as root user):

root@Edge-1:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7 <<<<<<
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
Stepping: 4
CPU MHz: 1995.312
BogoMIPS: 3990.62
Virtualization: VT-x
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
NUMA node0 CPU(s): 0-7


2. Check which of the CPU cores are used by the dataplane, log in as admin user:
Edge-1> get dataplane
Accept_ra : False
Bfd_ring_size : 512
Bitw_mode : False
Corelist : 0,1,2,3,4,5 <<<<< cores used for dataplane


In this example cores 6 and 7 are not used for the dataplane.

3. Get the PID of the kni_single kernel thread, login as root:

root@edge02:~# ps -aux|grep -i [k]ni_single
root      7128  4.1  0.0      0     0 ?        S    Aug12 243:18 [kni_single]


4. Use the "taskset" command to list the current CPU affinity for the kni_single process:
root@edge02:~# taskset -pc 7128
7128's current affinity list: 0-3


Note: The above PID 7128 will be different.

5. Use the taskset command to set the cpu affinity of kni_single to non datapath cores: 
root@edge02:~# taskset -pc 6-7 7128
7128's current affinity list: 0-3
7128's new affinity list: 6,7


Note: The above PID 7128 will be different and the core numbers may be different.

6. Verify that the change was made:
root@edge02:~# taskset -pc 7128
7128's current affinity list: 6,7


Note:
The above command outputs and values are only examples, these may vary depending on your environment.
This workaround will not persist across reboots.

If you have issues with this workaround, please contact Broadcom Support and note this Article ID (312599) in the problem description.

Attachments

set_kni_affinity_cron get_app
set_kni_affinity get_app