VIP failover on VMs connected to NSX segment results in traffic disruption
search cancel

VIP failover on VMs connected to NSX segment results in traffic disruption

book

Article ID: 322500

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Multiple Virtual Machine (VM) deployments are configured with High Availability (HA) in Active/Standby Mode.
  • A cluster VIP IP address is allocated to an active VM.
  • When a standby VM becomes active due to availability issues or planned failover, the cluster VIP IP address is moved to the new active VM.
  • When failover occurs the new active VM may become unreachable.
  • This can occur on clusters such as Windows Failover Clusters, Openshift Cluster, 3rd party load balancer VMs and others.
  • The Default IP Discovery Profile is configured on the segment
  • If inspected network traffic is found to be delivered to the VM which was the former VIP owner
  • On the ESXi host a duplicate IP may be identified

    nsxcli -c get service nsx-cfgagent cache-table l2 remote | grep -B 2 -A 2 "VIP IP"

    UUID LOG_SWITCH_FIB 1 L2_VM_IP None {
      "mac": "MAC1", <<< Old VIP owner
      "ip": "VIP",

    UUID LOG_SWITCH_FIB 1 L2_VM_IP None {
      "mac": "MAC2",  <<< Current VIP owner
      "ip": "VIP"

Environment

VMware NSX

Cause

The default IP discovery profile has Trust On First Use (TOFU) enabled.
TOFU will keep the initial IP-MAC-Port binding and will assume that it will never expire.
TOFU is not suited for VM HA use cases, as the cluster IP will be reassigned when a standby VM becomes active.
In addition, VMware tools based IP discovery will check IP configuration inside the VM but does not confirm if the IP is actively being used.
It may discover the cluster IP from a standby VM that is not actively using it. As discovered IP addresses are used for NSX L2 forwarding and security features, using the default profile in such scenarios (VM HA) can result in traffic outage.
Some clustering software may ahve the cluster VIP still active for a short period of time on the standby node post failover. This results in a duplicate IP address.

Resolution

This may be seen on all NSX versions and is a configuration issue.

Workaround:

1. On the NSX UI,  Networking > Segments > Segment Profiles, create a new IP Discovery Profile with the following settings

    • Enable Duplicate IP Detection
    • Disable both VMware Tools and VMware Tools - IPv6 discovery methods
    • Enable ARP Snooping and ND Snooping
    • Disable Trust on First Use (TOFU)
    • Increase the ARP Binding Limit or ND Snooping Limit to allow for the number of IP addresses that may exist on a single interface

2. Edit the segment and apply the new IP Discovery Profile

3. 3rd party load balancer VMs should be added to the DFW exclusion list

Additional Information