- When lcores are allocated to an NSX EDP Switch if the number of allocated lcores exceeds the number of physical queues available on the physical NIC, the configuration could result on an environment that does not guarantee NUMA alignment for all workloads deployed on the server.
As an example, the following EDP switch is configured with 18 lcores:
Example: ENS switch list:
name swID maxPorts numActivePorts numPorts mtu numLcores lcoreIDs ------------------------------------------------------------------------------ DvsPortset-3 1 128 8 8 9000 18 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
But the number of available queues for the PNIC is 8:
ENS port list for Switch DvsPortset-3
portID ensPID TxQ RxQ hwMAC numMACs type Queue Placement(tx|rx) ------------------------------------------------------------------------------ 23152xxxx 0 8 8 00:00:00:00:00:00 0 UPLINK 4 5 6 7 8 9 10 11 |5 4 5 6 7 8 9 10 23152xxxx 1 8 8 00:00:00:00:00:00 0 UPLINK 4 5 6 7 8 9 10 11 |5 4 5 6 7 8 9 10
- This resulted on lcore allocation that only allocated lcores from NUMA0 for all uplink data processing
Note how the lcores associated to Uplinks (4 – 11) are only on NUMA0:
ENS NUMA affiniy Lcore ID Switch Affinity -------- ------------ -------- 0 DvsPortset-2 0 1 DvsPortset-2 0 2 DvsPortset-2 1 3 DvsPortset-2 1 4 DvsPortset-3 0 5 DvsPortset-3 0 6 DvsPortset-3 0 7 DvsPortset-3 0 8 DvsPortset-3 0 9 DvsPortset-3 0 10 DvsPortset-3 0 11 DvsPortset-3 0 12 DvsPortset-3 0 13 DvsPortset-3 1 14 DvsPortset-3 1 15 DvsPortset-3 1 16 DvsPortset-3 1 17 DvsPortset-3 1 18 DvsPortset-3 1 19 DvsPortset-3 1 20 DvsPortset-3 1 21 DvsPortset-3 1
With that configuration, a data plate intensive application running on NUMA1 would have to cross NUMA boundaries to be able to send traffic on the physical network, resulting on lower throughput for that application when compared to the same app running on NUMA 0.
For more details on recommended configuration to guarantee NUMA alignment refer to the NUMA alignment on multi-sockets systems section on the latest TCP Performance Tuning Guide.
Make sure you validate the following:
Step 1: Determine the Max lcores support by the host:
- ssh to ESXi host and run the following command:
# esxcli network ens maxLcores get 26
Step 2: Determine the Max Physical NIC driver queue supported in your host:
- ssh to ESXi host and run the following command:
# nsxdp-cli ens port list portID ensPID TxQ RxQ hwMAC numMACs type Queue Placement(tx|rx) ------------------------------------------------------------------------------ 22817xxx 0 8 8 0c:42:a1:98:88:08 0 UPLINK 0 1 2 3 4 5 6 7 note: Make a note of TxQ and RxQ example here 8 .
Step 3: Align the lcore on ENS based on the driver queue which is 8 instead of 18 lcores.
NOTE: If driver module supports more that 8 queues, this needs to be update to max number of driver module parameters , kindly involve the driver vendor before making the changes.
Conclusion: As we noticed in the TxQ and RxQ from the driver which is 8 however, in the ENS Switch list we had 18 lcores which is overcommit / misaligned hence we notice the throughput issue on the application.