Increased SAP HANA response time due to network latency with VMXNET3
search cancel

Increased SAP HANA response time due to network latency with VMXNET3

book

Article ID: 324504

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides information on a known issue with SAP HANA and VMware VMXNET3 virtual NIC's.

Symptoms:
You may observe increased SAP HANA response time when a large number of concurrent users (~44,000) are running simultaneous transactions on the system.

See table below for latency and throughput numbers for vmxnet3 driver when compared against bare metal Linux installation:

Network Latency: VMXNET3 vs Bare Metal comparison (measured in microseconds with up too 65000 users) 
 Baseline Latency with no load (us)Latency at peak load with 44000-65000 concurrent users (us)
Bare Metal2695
VMXNET384318.82

Throughput: VMXNET3 vs Bare Metal comparison (measured in transactions per hour)
 35% CPU 23000 Concurrent users65% CPU 44000 Concurrent usersHi Load 64000 Concurrent users
Bare Metal291464455503927105993
VMXNET3291197955022476472627
Delta-0.09%-0.87%-8.91%

Notes:
Network latency of the VMXNET3 driver in µs and OLTP transaction response time in ms (SAP LUWs consist of many, for example hundreds of database LUW)
  • Network latency in µs for each round trip accumulates for each DB LUW, ultimately aggregating to observed delay of milliseconds for each SAP transaction
  • The network latency for the VMXNET3 causes higher OLTP response time when compared to bare metal for the same business transactions.
  • Users should expect VMXNET3 to add 200 - 300 microseconds to round trip time when compared to bare metal at high load. This will result in an increase of 40-87 milliseconds of OLTP response time at 65% CPU utilization(44000 concurrent users)
  • While OLTP response time delays of 40-87ms were observed, VMware observed less than 10% delta of transactions per hour between VMXNET3 vs bare metal under load of 23000 and 64000 concurrent users.
SAP and Database Logical Unit of Work (LUW):
A SAP LUW is a logical unit work that spans over several dialog steps. Let us consider a complex business transaction of n number of dialog steps; where each dialog step can have many DB LUW involving database operations that are logically related to each other. It is important that the changes to the database for this business transaction must be made all at once or it must be rolled back all for the data consistency; that is where the SAP LUW becomes critical in maintaining a logical relationship between many DB LUWs (dialog steps) & making all the changes to the database (COMMIT) in the final DB LUW. 

The following SAP business transactions (interactive/online mode) were used to measure system performance :
  • VA01 - Create Sales Order
  • VL01N - Create Delivery
  • VA03 - Display Order
  • VL02N - Post Goods Issue
  • VA05 - List Open Orders
  • VF01 - Create Invoice
In many cases these transactions will be run by users in online/interactive mode, however there are situations where the equivalent BAPI’s or IDOCS will be used via interfaces for batch or mass transactions. VMware recommends that users conduct performance tests on production like environments to determine if the increased response time will negatively impact their business processes.

Environment

VMware vSphere 7.0.x

Cause

VMware is actively investigating the root cause of this increased network latency. VMware is actively working with SAP to troubleshoot this issue and determine the appropriate corrective action.

Resolution

This KB will be updated with the release version of the fix once the root cause is identified.