Configure and apply NSX-T Segment IP discovery profile when using high availability (HA) for Virtual Machines.

search cancel

Configure and apply NSX-T Segment IP discovery profile when using high availability (HA) for Virtual Machines.

book

Article ID: 322500

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This KB article outlines known issues and symptoms when VMs with High Availability (HA) technologies are connected NSX segments that use the default IP discovery profile and provides a suitable workaround.

Multiple Virtual Machine (VM) deployments are configured with High Availability (HA) in Active/Standby Mode.
A cluster IP address is allocated to an active VM.
When a standby VM becomes active due to availability issues or fail over, the cluster IP address is allocated to the new active VM.
While using the default IP discovery profile, and when fail over occurs the new active VM may become unreachable.
This is also applicable to Windows Failover Clusters.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

The default IP discovery profile has Trust On First Use (TOFU) enabled.
TOFU will keep the initial IP-MAC-Port binding and will assume that it will never expire.
TOFU is not suited for VM HA use cases, as the cluster IP will be reassigned when a standby VM becomes active.
In addition, VMware tools based IP discovery will check IP configuration inside the VM but does not confirm if the IP is actively being used.
It may discover the cluster IP from a standby VM that is not actively using it. As discovered IP addresses are used for NSX L2 forwarding and security features, using the default profile in such scenarios (VM HA) can result in traffic outage.

Resolution

This is a configuration issue.

Workaround:

Create a new IP discovery Profile.
Disable both VMware tools-based IP discovery option(s).
Enable ARP and ND snooping and disable Trust on First Use (TOFU).
Add the required number of ARP bindings in ARP binding limit, as by default the limit is 1, if the VM has a VIP, then it should be at least 2, however this may be more than the 2 if resources on the nodes also have floating IPs .The ARP binding limit needs to be as many or more than the maximum number of IPs that may be present on the VM at any one time.
Apply the new profile to the segment with HA deployments.

Additional Information

Impact/Risks:
This configuration and behavior are applicable for all version of VMware NSX.

Note:
In case of NFV VM HA, the GARP initiated by the NFV VM HA event clears the old ARP/ND learnt IP Discovery bindings.

Related KBs

Windows Failover Cluster does not work as expected when using NSX segments

Feedback

thumb_up Yes

thumb_down No