All Virtual Machines Network Connectivity Lost post NSX Upgrade
search cancel

All Virtual Machines Network Connectivity Lost post NSX Upgrade

book

Article ID: 427371

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Following an upgrade of the NSX environment, all production servers may experience a complete loss of network connectivity.
  • Occurs when Edge VMs exit Maintenance Mode post upgrade or during post-upgrade host initialization.
  • Packet captures on the specific vNIc to the VM shows ARP requests being sent but no response.
  • Login to the specific ESXi host that has the VM and run through the following commands
    • Login to the ESXi host as user root
    • Run the command nsxdp-cli vswitch instance list |grep -i <vm-name>
    • Make a note of the specific switch-port-id that is assigned to the VM interface
    • Run the packet capture command pktcap-uw -- switchport <switch-port-id>-- capture VnicRx, VnicTx  -c 15 -o- | tcpdump-uw -r - -nnee  
16:18:21.270870 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
16:18:21.430823 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
16:18:21.620257 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
16:18:22.077921 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
16:18:22.226125 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
16:18:22.273179 <Source MAC Address> > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has <Destination IP Address> tell <Source-IP-Address>, length 46
  • Further, the packet capture command can be run the trace the Drop Reason and Drop Function  :-  pktcap-uw --switchport <switch-port-id> --trace 
  • As shown below, the Drop Reason is due to the 'MAC Forgery Drop' and Drop Function is 'L2Sec_FilterSrcMACForgeries'

Environment

VMware NSX 3.2.2

Cause

  • Prior to NSX 3.2.2, post upgrade, the workflow responsible for refreshing uplink MAC addresses fails to trigger automatically.
  • Symptoms include the failure of the NSX segments to automatically learn new MAC addresses and the presence of "MAC Forgery Drop" errors in packet traces.
  • This leads to the MAC addresses not being learnt properly and the segment security identifies the traffic as "MAC Forgery," dropping the packets immediately.
  • The failure to learn the MAC records eventually leads to widespread ARP timeouts across the segment.

Resolution

  • As a workaround, enable the MAC-Learning on the specific NSX Segment's MAC Discovery Profile.

    • Login to the NSX Manager UI <https://<NSX-Manager-IP>:443 as user admin
    • Navigate to Networking -->Segments --> Edit the specific Segment --> Go to the Segment Profiles and verify the "MAC discovery segment profile" used.

  • To ensure that the corresponding MAC Discovery Profile has the MAC Learning enabled or not, Navigate to Networking --> Segments --> Profiles --> Edit the MAC discovery profile that was noted in the above step. Ensure that MAC Learning is enabled.

 

Additional Information

This issue is resolved in VMware NSX 3.2.3, 4.1.2 and above available at Broadcom downloads.