East-West traffic between workloads behind different T1 is impacted, when NAT is configured on T0
search cancel

East-West traffic between workloads behind different T1 is impacted, when NAT is configured on T0

book

Article ID: 325077

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

In a Multi-tenant (T1s) environment with the following conditions:

  1. Workloads behind T1s have SNAT rules configured on the T0 for S-N communication
  2. A T1 has a Service Router (SR) component configured
  3. Tenant T1s connected to the same T0 router, and the SR component of a Tenant T1 is active on the same Edge node where the T0 SR is active.
  4. In the T0 NAT configuration, any DNAT rule is configured
Below is an example from a PKS topology. Same issue will be seen in a Non-PKS topology as well.
 

Topology Information:

- Bosh IP: 172.16.80.2

- Bosh IP 172.16.80.2 is SNAT’ed at the T0 to 192.168.80.2

- K8s Master VM IP: 172.16.90.2

- K8s Master VM IP 172.16.90.2 is SNAT’ed at the T0 to 192.168.90.2

- There’s a DNAT rule configured at the T0 to translate 192.168.80.2 to 172.16.80.2 for North-South communication

 

Note:

      This DNAT rule is configured for illustration purpose in this use case. This issue would occur even if a DNAT rule is configured for a completely different workflow

 

  • East-West traffic between workloads behind different T1 is impacted, when communication happens over their Private IPs (i.e. NON-NATed IPs). In the above example when 172.16.80.2 tries to communicate 172.16.90.2, the communication is impacted.

 

Note:

  • East-West traffic between workloads behind different T1s works as expected when workloads communicate over their Public IPs (NAT’ed IPs). In the above example when 172.16.80.2 tries to communicate 192.168.90.2, the communication will work as expected.
  • North-South traffic outside NSX-T works as expected. In the above example, when 172.16.80.2 sends a packet to an external workloads, north of T0 Logical Router, we will see the SNAT occur and communication will work as expected. 


Environment

VMware NSX-T Data Center

Cause

The SNAT rule starts taking effect in 2.4.2 between the T0 and T1 causing the traffic to be SNATed twice, once while traffic is egress to the destination and once again when traffic returning back from the destination. This leads to the workload dropping the traffic. The following is the packet walk, for the above example: 

Packet Walk:

Request:

  1. Packet leaves Opsmgr with source IP 172.16.80.2 and Destination 172.16.90.2
  2. Traffic arrives at the T1 DR interface on the Source host and is routed to the T1 SR which resides on Edge Node edge01. Source is 172.16.80.2 and Destination is 172.16.90.2
  3. From the edge node, the traffic is routed from the T1 SR to the T0 DR on edge01. Source IP is 172.16.80.2 and Destination is 172.16.90.2
  4. Starting 2.4.2, Firewall is automatically enabled on LinkedPorts. Also, SNAT is applied as traffic egresses the T0 DR interface to the Destination T1. This causes the source IP to be SNATed as it leaves the T0 DR interface towards the Destination T1. At this point, the Source IP is SNATed to 192.168.80.2 and Destination is 172.16.90.2
  5. Packet is sent to the destination host via Geneve Encapsulated packet

 

Response:

  1. Destination sends a packet with Source IP 172.16.90.2 and Destination 192.168.80.2
  2. Traffic is routed via the T1 DR interface on the source host to T0 DR on the source host. Since there’s no T1 SR on this T1 router, the traffic will directly be forwarded from T1 DR to T0 DR on this host. Source IP of this packet is 172.16.90.2 and Destination 192.168.80.2
  3. Since Firewall is enabled by default on Linked Ports, the traffic will be forwarded from the T0 DR on this host to the T0 SR on edge node edge01. At this point, the source IP of the packet is 172.16.90.2 to 192.168.80.2
  4. From the T0, the packet will be forwarded to the Source T1 DR interface. Here SNAT and Reverse NAT will occur, and the source IP will now be changed to 192.168.90.2 and Destination to 172.16.80.2
  5. Packet will be sent to the Destination host via a Geneve Encapsulated packet to destination 172.16.80.2 with Source IP of 192.168.90.2
  6. The destination VM will drop the packet since  the response is received from Different IP  ”192.168.90.2” instead  of “172.16.90.2”

Resolution

This issue is resolved in VMware NSX-T Data Center 2.5.

Workaround:

Workaround:

If there are no services (like NAT, Firewall, etc.) on the T1 SR, it is safe to detach it from the edge cluster. This is a two-step process as illustrated below

  1. Choose T1 Router → Services →  Edge Firewall → Click on Disable Firewall

  1. Click on Overview → Click Edit (Next to Summary) → Click the ‘x’ next to Edge Cluster to detach the edge cluster from T1

 

If you are not able to perform this workaround or have any additional questions, file a support request with VMware Support and quote this Knowledge Base article ID (71363)  in the problem description.