Network Latency observed during vMotion of time sensitive VM
search cancel

Network Latency observed during vMotion of time sensitive VM

book

Article ID: 375406

calendar_today

Updated On:

Products

VMware NSX VMware NSX Networking VMware NSX-T Data Center

Issue/Introduction

TCP retransmission is observed during vMotion of time sensitive database VMs. These database (DB) VMs hosts applications that are high latency sensitive.

Such symptoms are experienced under below condition.

1. Host BFD tunnels to NSX Edge are in different subnet.
2. This is the first VM that onboards to a host/vMotion to the host.

 

Environment

VMware NSX
VMware NSX Data Center

Cause

When vMotion occurs to a host that doesn't have any workload (no BFD tunnels UP), it starts sending the traffic out. This traffic doesn't have path until the BFD tunnels comes UP.

Edge usually is located in different subnet than host and it uses MTEP (hierarchical two tier replication) replication. For every remote L2 domain, the source Transport Node will elect a remote MTEP and will forward the BUM traffic to each MTEP in each remote L2 domain.

Usually host elects an MTEP based on BFD sessions state being Up or Down to the remote vteps in that subnet. One remote vtep would be elected as an MTEP randomly whose BFD session state is UP for that subnet. If there is no remote hosts or edges to which BFD sessions is UP on that subnet then hierarchical two tier replication cannot happen from host.

BFD tunnels to TEPs in different subnet (the subnet where Edges are present) can take up to 2-3 seconds after session creation.

Resolution

Version where this is fixed : NSX 4.2.1 release

The fix for this issue comes with enhancement of the MTEP election algorithm to choose a random MTEP if BFD sessions are coming up. This way latency reduction between 1-1.5 seconds is expected when the first VM vMotions to the host.

On comparison with vMotion latency when the tunnels are already established on the destination host,

Expected Latency before NSX 4.2.1,
Latency = MAX("2 seconds for tunnels to come up", "< 500 msec for Logical Switch/Routing Domain span to be pushed down from Controller")  = ~2 seconds

Expected Latency with NSX 4.2.1,
Latency = MAX(0, "< 500 msec for Logical Switch/Routing Domain span to be pushed down from Controller")  = ~500 msec.

Workaround :

Workaround is to have a dummy VM on the host for the BFD tunnels to remain UP.