VM lose network connectivity when redeploying an HA enabled vCNS/NSX
search cancel

VM lose network connectivity when redeploying an HA enabled vCNS/NSX

book

Article ID: 343854

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

The issue of virtual machines losing network connectivity when redeploying an High Availability (HA) enabled VMware vCloud Networking and Security or NSX for vSphere Edge is resolved in VMware NSX for vSphere 6.2.4.


Symptoms:
When you upgrade or redeploy an HA-enabled VMware vCloud Networking and Security (vCNS) or NSX for vSphere Edge Services Gateway (ESG), you experience these symptoms:
  • Forwarding of network traffic is disrupted for a few seconds
  • Virtual machines lose network connectivity
  • An application on the end virtual machine does not working properly due to network disruption on VMware vShield Edge configured for High Availability (HA)
  • A pair of VMware vShield Edges in High Availability mode experience a split-brain scenario
  • Running the commandshow service highAvailabilityon vShield Edges reports both edges in active state


Environment

VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.1.x
VMware vCloud Networking and Security 5.5.x
VMware NSX for vSphere 6.0.x

Cause

This issue occurs when a split-brain occurs in Heartbeat/Pacemaker (HA resource Manager). This happens when HA packets are dropped occasionally beyond the timeout period and causes the HA status to flip.

When EdgeVm-0 is active and EdgeVm-1 is on standby, all the end virtual machines get the mac address of EdgeVm-0. When split-brain happens, EdgeVm-1 becomes active and sends a Gratuitous Address Resolution Protocol (GARP) packet to update the mac on the end virtual machines. The end virtual machine then starts sending traffic to EdgeVm-1. The end virtual machines get the mac address of either EdgeVm-0 or EdgeVm-1, whichever responds to the ARP request first. This results in network disruption.


Also, redeploying VMware vCloud Networking and Security or NSX for vSphere Edge is disruptive even with HA enabled.

Note: VMware recommends to first apply a force sync before redeploying the Edge. If the issue does not get resolved using force sync, then redeploy. For more information, see the Redeploy NSX Edge section in the NSX Administration Guide.

To minimize this disruption, the HA process pairs two virtual machines and then redeploys one virtual machine at a time. Each Edge virtual machine is assigned an HA index value, and index 0 is always redeployed first and fails over if it is the active virtual machine.When the second virtual machine is redeployed, this also fails over.

For more information, see Troubleshooting NSX Edge High Availability (HA) issues (2126560).

Resolution

This issue is resolved in VMware NSX for vSphere 6.2.4, available at VMware Downloads.

If you are unable to upgrade at this time, follow the workaround.


To work around this issue, consider decreasing the Declare Dead Time setting from the default value of 15 seconds.

Note: The mentioned workaround only minimizes the downtime. This does not resolve the issue.

If a heartbeat is not received from the active Edge within the specified time, the active edge is declared dead. The standby edge then moves to the active state, takes over the interface configuration of the primary appliance, and starts the NSX Edge services that were running on the primary appliance. When the switch over takes place, a system event is displayed in the System Events tab within Settings & Reports.

Before proceeding with the workaround:

  1. Confirm that a split-brain scenario occurs by verifying if both vShield Edges are active.
  2. Ensure that the two vShield Edge virtual machines are able to communicate (ping) with each other via the High Availability (HA) interface.

    Note: If the two vShield Edge virtual machines do not communicate, then the split-brain scenario occurs by a network issue. Repair the network and wait and see if the Edge pair resolves the split-brain automatically. This usually resolves the issue and no further action needs to be taken.
To configure heartbeat settings:
  1. Log in to the vSphere Web Client.
  2. Click Networking & Security and then click NSX Edges.
  3. Double-click an NSX Edge appliance.
  4. Click the Manage tab and then click the Settings tab.
  5. In the HA Configuration panel, click Change.
  6. In the Change HA Configuration dialog, enter the new value in the Declare Dead Time field. The default is 15 seconds.
  7. Click OK.

VMware recommends using best practice to configure a dedicated vNic/pNic for the HA interface.


Additional Information

Troubleshooting NSX Edge High Availability (HA) issues
在重新部署已启用 HA 的 vCNS/NSX 时虚拟机丢失网络连接
HA 対応の vCNS/NSX を再デプロイすると仮想マシンがネットワークから切断される