Symptoms: In an environment where the DLR Control VM is deployed either in HA mode or standalone with dynamic routing is configured, you see these symptoms:
Data plane experiences outage, no more dynamic routes on the ESXi hosts.
DLR Control VM sending explicit ‘leave’ message to the Controller will remove dynamic routes on the ESXi hosts. No route situation on the hosts will cause data plane outage in the North-South direction.
In the NSX Controller logs, you see entries similar to:
2018-05-10 03:52:37,947 43058651830 [vdr worker 3] INFO com.vmware.controller.apps.vdr.config.RouteConfig - vdrId <vdr_id> is in softflush state and EOM is received
2018-01-19T20:45:39.950105+00:00 2018-05-10 03: 52:39,949 43058653832 [vdr worker 3] INFO com.vmware.controller.apps.vdr.VdrService - Closing vdr <vdr_id> for connection Connection [ip=10.254.33.24:40435, cnnId=1790]
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware NSX for vSphere 6.3.x VMware NSX for vSphere 6.2.x
Cause
This issue occurs if the DLR Control VM is disconnected (For example: Shutdown, switchover, etc.) from the NSX Controller Cluster, the NSX Controllers cluster will initiate a flush of the dynamic routes on all the ESXI hosts (related to that DLR Control VM).
Resolution
This issue is resolved in:
VMware NSX for vSphere 6.3.5 and later versions, available at VMware Downloads.
VMware NSX for vSphere 6.4.0 and later versions, available at VMware Downloads.
Note: After the fix, the NSX Controllers cluster keep the routes even if the DLR Control VM is disconnected to the Controller or switchover to standby DLR Control VM.
Additional Information
Impact/Risks: Control plane outage can affect data plane by removing dynamically learned routes from DLR instance.