Performance degradation in ESXi PNIC (NSX Edge VM hosting) observed following an upgrade from NSX-T versions 3.1/3.2 to NSX 4.1.x
search cancel

Performance degradation in ESXi PNIC (NSX Edge VM hosting) observed following an upgrade from NSX-T versions 3.1/3.2 to NSX 4.1.x

book

Article ID: 345760

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • The ESXi Host's PNIC is encountering a throughput bottleneck capped at 300k packets per second (pps), alongside CPU cores operating at over 90% utilization. Additionally, Flow Cache usage has reached a critical level of 98%.
  • To diagnose this performance issues, it is recommended to gather ADF data and netstat reports on the ESXi where the Edge VM is hosted, to analyze CPU cycle distribution and identify processes consuming excessive CPU resources. 
 


Environment

VMware NSX-T

Cause

The performance is degraded compared to 3.x after upgrading to 4.x. The performance drop is due to the increased object size in NSX 4.x after upgrading from NSX-T 3.x, while keeping their ESXi version at 7.x. This increase leads to longer packet processing times.

Resolution

Resolved in upcoming 4.1.2.2 release.

Workaround:
Option 1:
Check ESXi version on the host and if the build number is less than 20842708

The workaround is to upgrade the ESXi version to 7.0.3P06 (build #20842708) or later.

Option 2:
Disable Flow Cache on the host where performance degradation is observed
# nsxdp-cli fc disable
Please run net-dvs --persist to make immediately persist.
# net-dvs --persist

OR

edit /etc/vmware/nsx/nsx-cfgagent.xml as below

<flowCache>
  <enabled>false</enabled>
  <mcastEnabled>false</mcastEnabled>
</flowCache>
 
Then restart the nsx-cfgagent service:
/etc/init.d/nsx-cfgagent restart

Additional Information

Impact/Risks:
Applications are timing out due to throughput and retransmissions from TCP sessions.