Traffic disruption when Preserved Client IP is enabled on NSX Advanced Load Balancer
search cancel

Traffic disruption when Preserved Client IP is enabled on NSX Advanced Load Balancer

book

Article ID: 322558

calendar_today

Updated On:

Products

VMware Cloud Director VMware NSX

Issue/Introduction

Symptoms:

  • You have VMware vCloud Director, VMware NSX Advanced Load Balanced (Avi) and VMware NSX deployed in the environment.
  • You are configuring NAT rules on VMware NSX via VMware vCloud Director.
  • You are using Preserved Client IP feature from NSX Advanced Load Balanced.
  • Both NSX Advanced Load Balanced Service Engine and Backend server are on the downlink of the same VMware NSX Tier-1 Gateway.
  • You have DNAT rule and SNAT rule on the same Tier-1 Gateway.
  • SNAT or DNAT are not applied as expected.
  • Traffic is restored if you disable either the SNAT or DNAT rules.

Example:

DNAT rule applied on north public traffic entering the Tier-1's Uplink:

["dnat": "rule 1001 at 1 in protocol any postnat from any to ip 203.##.##.## dnat ip 198.##.##.1]
where 198.##.##.1 is the AVI LB VIP IP which then translates to 10.0.0.# of backend server.
[source IP: 172.##.##.1, destination IP: 10.0.0.#]

SNAT rule on same Tier-1's downlink:

["snat": "rule 1000 at 1 out protocol any prenat from ip 10.0.#.#/24 to any snat ip 203.##.##.##]
Above SNAT rule is applied when backend server replies : [source:10.0.#.# destination:172.##.##.1]

Since the return traffic is sent to AVI SE VM, as it expects the return packet to be reflective of 1st flow above but it sees packets from different source, it drops the packets.

Environment

VMware NSX-T Data Center
VMware Cloud Director 10.x

Cause

NAT processing on the downlinks is usually avoided, however the exception to this is that when any DNAT rule is configured on the logical router the DNAT has to be processed on the incoming direction, i.e. coming from outside to into the logical router, even when it is downlink. The code checks to see that if there is any DNAT rule configured on that logical router, it will enforce NAT (SNAT or DNAT) on the downlink.
As a result, any generic SNAT rule that is configured on the logical_router scope will get applied.
But if this SNAT rule is applied on an interface level, then the SNAT rule will be applied only on that specified interface.
However, if you configure NAT rules from vCloud Director, you won't be able to select a specific interface and the only option available is to apply rule on logical router rather than specific interface.

Resolution

This is a known issue impacting VMware NSX.
Disable the distributed routing feature from VMware vCloud Director, refer to the following links:

First go to the Provider portal -> Cloud Resources -> Networking -> Edges -> select our edge gateway -> select Edit -> Select "Allow Non-Distributed Routing" and save. In the tenant select Networking -> select your desired network -> Edit -> Connection -> de-select the toggle for "Distributed Routing" -> Save. 

When we disable the distributed routing, the segment which is connected to the SE VMs will be disconnected from Tier1 gateway, then a service interface will be created on the Tier 1 gateway and connects to the SE VMs segment.

This way the traffic is not passing through a downlink interface, rather it passes through a service interface.
When the traffic first enters the logical router on a service interface and then exits out on a downlink interface, the NAT processing will not happen on the downlink interface because of double-lookup optimization.