Backend servers inaccessible after edge redeploy when edge has multiple uplink primary IP addresses
search cancel

Backend servers inaccessible after edge redeploy when edge has multiple uplink primary IP addresses

book

Article ID: 314271

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are using NSX for vSphere
  • After redeployment of ESG, backend servers are unavailable
  • Disabling and enabling the edge firewall re-establishes the connection
  • ESG uplink interface has multiple primary IPs
  • Checking vshield_edge_flow_table in the edge logs you can see the SYN_RECV reply from the backend server reaching the ESG:
node-0-disabled$ cat vshield_edge_flow_table | grep ##.##.##.##
37: tcp 6 29 SYN_RECV src=##.##.##.## dst=##.##.##.## sport=53084 dport=443 pkts=1 bytes=52 src=##.##.##.## dst=##.##.##.## sport=443 dport=53084 pkts=4 bytes=208 mark=262144 rid=173458 use=1

 
  • In a working environment we expect to see the connection ESTABLISHED as per the below example:
node-0-disabled$ cat vshield_edge_flow_table | grep ##.##.##.##
14: tcp 6 21369 ESTABLISHED src=##.##.##.## dst=##.##.##.## sport=59813 dport=443 pkts=2 bytes=92 src=##.##.##.## dst=##.##.##.## sport=443 dport=59813 pkts=1 bytes=52 [ASSURED] mark=262144 rid=173458 use=2
  • The expected packet flow is as follows:
    • Ingress to ESG > DNAT > To DST > From DST > SNAT > Egress ESG
  • Witnessed packet flow:
    • Ingress to ESG > DNAT > To DST > From DST > SNAT 
  • Packet capturing on the uplink interface of the ESG you can see ARP requests.


NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

NSX for vSphere 6.4.x

Cause

  • This issue is due to the return packet not exiting the ESG uplink interface via correct primary address
  • Capturing on the uplink interface of the ESG you can see ARP requests
    • The ARP is for one of the primary IP addresses of the uplink interface, however as it is on a different subnet to the default gateway the traffic is dropped
  • The pick of the src IP address in the arp request is random. In this instance it picked the incorrect primary IP
    • To confirm this run the below GET API on the edge:
      • GET /api/4.0/edges/{edgeId}/systemcontrol/config
      • The expected response is below(announce=0):
<systemControl>
<property>sysctl.net.ipv4.conf.all.arp_announce=0</property>
<property>sysctl.net.ipv4.conf.default.arp_announce=0</property>
</systemControl>
 
  • After disabling and re-enabling firewall the correct src IP address is selected so connectivity is restored

Resolution

  • Use the below API to change the values of sysctl to 1 which requires the ARP source address must be part of target network.
PUT /api/4.0/edges/{edgeId}/systemcontrol/config

Payload:

<systemControl>
<property>sysctl.net.ipv4.conf.all.arp_announce=1</property>
<property>sysctl.net.ipv4.conf.default.arp_announce=1</property>
</systemControl>

 
  • The API call persists after upgrade, reboot and shutdown.


NOTE: You can use the following KB article to configure the Postman API client for use with the NSX manager - Configure POSTMAN for REST API Calls with NSX-V