NSX Distributed and Gateway Firewalls drop VM traffic during vMotion or Storage vMotion when IP discovery is through VM Tools
search cancel

NSX Distributed and Gateway Firewalls drop VM traffic during vMotion or Storage vMotion when IP discovery is through VM Tools

book

Article ID: 319137

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
- During vMotion or Storage vMotion of a Virtual Machine, its IP is incorrectly dropped from Firewall addrset. Flows then do not match specific rules and could hit the default rule.
- The VM's IP address was changed at some point
- Firewall rule source or destination is defined by Group
- Group Membership is not done by explicit IP address (i.e. by Virtual Machine, Segment, etc)

To verify this behavior on Gateway Firewall, this can confirmed with the below commands on an Edge with the relevant T0 or T1 SR:
 
To obtain the UUID of the T0 or T1 uplink SR interface, run:
get logical-routers
vrf <SR VRF#>
get interfaces
exit    (To leave VRF and return to Edge shell)

To identify the relevant addrset for the expected Allow rule, run:
get firewall <uplink interface UUID> ruleset rules
  
Check addrset membership before, during, and after Storage vMotion:
get firewall <UUID> addrset name <addrset>

Depending on the issue hit, the IP of the VM being relocated will be dropped from the address set for the duration of Storage vMotion, or for around 15 seconds during vMotion or Storage vMotion.

- IP Discovery profile on Segment likely has default ARP Binding Limit of 1, and the IP of the affected interface has changed at some point
- Port in Manager view in UI > Networking > Logical Switches > Ports > Address Bindings > Realized Bindings shows the affected IP has Discovery Type of VM_TOOLS. If continually refreshed, this window will show the affected IP address learned by VM_TOOLS dropped from the Realized Bindings lost during an outage.

Environment

VMware NSX 4.x
VMware NSX-T Data Center

Cause

For the issue with the outage for the duration of Storage vMotion, VM Tools status is changed to stopped during Storage vMotion. IP addresses discovered by VM Tools are then dropped from Group's Effective Members and Firewall address set, causing traffic to not match the expected rule.

Resolution

The issue with an outage for the full duration of Storage vMotion is resolved in VMware NSX 4.1.0. 
The issue of VM Tools-based IP discovery supporting vMotion and removing the ~15-30 second outage is resolved in VMWare NSX 4.1.1. The ESXi version needs to be at least 8.0 GA to utilize this feature.



Workaround:

  • Step 1. Create a new IP discovery profile and set Trust on First Use (TOFU) to off
  • Step 2. Ensure that the IP discovery profile applied has an ARP Binding Limit greater than or equal to the maximum number of IPs configured on a single port. Other settings can match the current IP discovery profile
  • Step 3. Apply the new IP discovery profile to the segments where VMs may need to vMotion or Storage vMotion
  • Step 4. Wait for a time greater than the ARP ND Binding Limit Timeout (10 minutes in default profile). This ensures stale entries are all aged out
  • Step 5. Turn TOFU back on
  • Step 6. Perform desired vMotion or Storage vMotion



Additional Information

Impact/Risks:
Outage for VM's traffic for duration of Storage vMotion, or for around 15 seconds during vMotion or Storage vMotion.