Data plane outage during control plane outage
search cancel

Data plane outage during control plane outage

book

Article ID: 324869

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
In an environment where the DLR Control VM is deployed either in HA mode or standalone with dynamic routing is configured, you see these symptoms:
  • Data plane experiences outage, no more dynamic routes on the ESXi hosts.
  • DLR Control VM sending explicit ‘leave’ message to the Controller will remove dynamic routes on the ESXi hosts. No route situation on the hosts will cause data plane outage in the North-South direction.
  • In the NSX Controller logs, you see entries similar to:

    2018-01-19 20:45:08,525 62097502 [vdr worker 0] INFO com.vmware.controller.apps.vdr.VdrService  - VSE Connection [ip=10.254.33.27:44761, cnnId=2] leave vdr 0x23a8 locale 00000000-0000-0000-0000-000000000000, explicit leave true

    2018-05-10 03:52:37,947 43058651830 [vdr worker 3] INFO com.vmware.controller.apps.vdr.config.RouteConfig  - vdrId <vdr_id> is in softflush state and EOM is received

    2018-01-19T20:45:39.950105+00:00 2018-05-10 03: 52:39,949 43058653832 [vdr worker 3] INFO com.vmware.controller.apps.vdr.VdrService  - Closing vdr <vdr_id> for connection Connection [ip=10.254.33.24:40435, cnnId=1790]


    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.2.x

Cause

This issue occurs if the DLR Control VM is disconnected (For example: Shutdown, switchover, etc.) from the NSX Controller Cluster, the NSX Controllers cluster will initiate a flush of the dynamic routes on all the ESXI hosts (related to that DLR Control VM).

Resolution

This issue is resolved in: Note: After the fix, the NSX Controllers cluster keep the routes even if the DLR Control VM is disconnected to the Controller or switchover to standby DLR Control VM.

Additional Information

Impact/Risks:
Control plane outage can affect data plane by removing dynamically learned routes from DLR instance.