Services on VMware NSX Bare Metal edge node may experience low throughput or high latency with rx

Products

VMware NSX

Issue/Introduction

Symptoms:

You are running VMware NSX lower than 3.2.3 or 4.1.1.
Services running on Bare Metal Edge (BME) may have dataplane impact, throughput may be lower than expected as well as high latency noticed.
Affected interfaces show rx_misses:

> get interface fp-ethX
...
Interface: fp-ethX
...
RX misses: 123456

In the edge syslog we see the following log messages at startup:

bm-edge-1 systemd 1 - - Started Edge NSD.
bm-edge-1 NSX 6619 - [nsx@6876 comp="nsx-edge" subcomp="nsd" tid="6619" level="INFO"] Set interface kni-lrport-0 state Up

Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center
VMware NSX-T Data Center 4.x

Cause

When an edge node boots up, there is a service called NSD which monitors the creation of kni-lrport-0 thread. When NSD (Namespace Daemon) detects the creation of the kni-lrport-0 thread, it triggers a script called ip.py, which will configure the CPU affinity for the KNI thread. In certain circumstances when the nestdb-server on the edge node is busy, the NSD service fails to receive the status on kni-lrport-0 and ip.py is not run.

For example in a working setup, in the edge syslog we see the following log messages:

### process NSD is started. NSD polls for any interface changes in every 500ms.
bm-edge-1 systemd 1 - - Started Edge NSD.
### process datapathd is started. datapathd creates interface kni-lrport-0, which is detected by NSD
bm-edge-1 NSX 8298 - [nsx@6876 comp="nsx-edge" subcomp="nsd" tid="8298" level="INFO"] Set interface kni-lrport-0 MAC 02:50:56:xx:xx:xx
### The message below is the most important message. The following message triggers a script called ip.py.
bm-edge-1 NSX 8298 - [nsx@6876 comp="nsx-edge" subcomp="nsd" tid="8298" level="INFO"] Detected parent kni interface kni-lrport-0 creation state up, ifindex 11
### script ip.py is called. The script configures KNI thread cores
bm-edge-1 NSX 10730 - [nsx@6876 comp="nsx-edge" subcomp="nsd" tid="10730" level="INFO"] ExecuteCmd (Child): Running: ' /usr/bin/sudo /opt/vmware/nsx-edge/bin/ip.py configure-cpu --kni kni-lrport-0', env: ''

In a setup where the script is not called, like the above symptoms log entries:

bm-edge-1 systemd 1 - - Started Edge NSD.
bm-edge-1 NSX 6619 - [nsx@6876 comp="nsx-edge" subcomp="nsd" tid="6619" level="INFO"] Set interface kni-lrport-0 state Up

Note: The log message 'Detected parent kni interface kni-lrport-0 creation state up' is not logged, as in the working log messages above. As a result, the ip.py script is not run, which leads to the KNI thread CPU affinity not being configured.

Step 1. To check if the CPU affinity is configured for the KNI thread, as admin on the BME run:

bm-edge-1> get dataplane | find Corelist
Corelist : 0,4,8,12,10,6,2,16,20,24,22,18,14,28,32,36,38,34,30,26,40,44,48,50,46,42

Step 2. As root, run 'ps aux | grep kni' to get the process ID (PID) for KNI thread:

bm-edge-1:~# ps aux | grep kni
root 17901 97.8 0.0 0 0 ? R Aug10 12648:14 [kni_single]

Step 3. Then use the command below command with the PID found in the previous command:

bm-edge-1:~# taskset -apc 17901

pid 17901's current affinity list: 0-51

If CPU affinity for the KNI thread is configured properly, the CPU cores in Step 1 and Step 3 should not be overlapping.

Alternatively, step 2 and 3 can be run with the single command: taskset -pc $(pidof kni_single)

Resolution

This issue is resolved in VMware NSX 3.2.3 and 4.1.1, available at VMware downloads.

Workaround:
There are two workarounds:
1. Place the impacted BME(s) in and out of maintenance mode, which will essentially, restart the datapath service. This may need to be repeated multiple times.

2. Use the following steps to manually configure the CPU affinity for the KNI thread.

2.(a). As admin user, run the command "get dataplane | find Corelist" to discover all datapath cores.

bm-edge-1> get dataplane | find Corelist

Corelist : 0,4,8,12,10,6,2,16,20,24,22,18,14,28,32,36,38,34,30,26,40,44,48,50,46,42

2.(b). View the file /var/run/vmware/edge/cpu_usage.json, go to the very bottom of the file and view the usage of non-DPDK cores, like below:

    "non_dpdk_cpu_per_core": {
        "1": 100.0,
        "3": 9.01,
        "5": 5.88,
        "7": 1.19,
        "9": 0.97,
        "11": 0.98,
        "13": 0.91,
        "15": 1.09,
        "17": 1.26,
        "19": 0.9,
        "21": 0.95,
        "23": 0.89,
        "25": 1.02,
        "27": 0.79,
        "29": 0.75,
        "31": 0.8,
        "33": 0.93,
        "35": 0.91,
        "37": 0.9,
        "39": 0.73,
        "41": 0.87,
        "43": 0.7,
        "45": 0.81,
        "47": 0.63,
        "49": 0.88,
        "51": 0.88

Under normal circumstances, there should be two non-DPDK cores running high usage. If the issue is present, there will be only one non-DPDK core running at high usage.
Select a non-DPDK core NOT running high usage, for example core 3 from above '"3": 9.01,'.
The non-DPDK core you select, should also NOT show up in the output from step 2.(a) above.
2.(c). On the edge, as root user, run "ps aux | grep kni" to find the PID for KNI thread:

bm-edge-1:~# ps aux | grep kni
root 17901 97.8 0.0 0 0 ? R Aug10 12648:14 [kni_single]

2.(d). Use the command 'taskset -apc <non-DPDK core found in 2.(b)> <PID found in 2.(c)>'

bm-edge-1:~# taskset -apc 3 17901
pid 17901's current affinity list: 0-51
pid 17901's new affinity list: 3

Note: No service restart is needed in this second workaround.