Symptoms:
Loss of connectivity is observed for VMs which have been vMotioned.
In the controller toragent.log, this will be the physical primary for the ToR in question, there will be repeated errors like the following:
2019-06-14 12:43:48,054 | DEBUG | nioEventLoopGroup-3-1 | JsonRpcEndpoint | Response : {"id":"1be28e7a-e548-41b2-####-############","result":[{"count":1},{},{"details":"Table Ucast_Macs_Remote column locator row d2e2f641-b925-42d7-####-########## references nonexistent row 22054b92-7c1c-4622-####-######## in table Physical_Locator.","error":"referential integrity violation"}],"error":null}
2019-06-14 12:43:48,155 | ERROR | pool-6-thread-10 | TorInorderMessageProcessor | Error when processing a QueueElement com.vmware.toragent.tormgr.util.TransactionException: ExceptionMessage: Error updating rowOperationsRequested: 2OperationsExecuted: 3 at com.vmware.toragent.tormgr.southbound.TorClient.executeTransaction(TorClient.java:2012) at com.vmware.toragent.tormgr.southbound.TorClient.executeTransaction(TorClient.java:1984) at com.vmware.toragent.tormgr.southbound.UcastMacEvent.updateUcastMacRemoteRow(UcastMacEvent.java:132) at com.vmware.toragent.tormgr.southbound.UcastMacEvent.processUpdate(UcastMacEvent.java:73) at com.vmware.toragent.tormgr.southbound.SouthboundEvent.processElement(SouthboundEvent.java:79) at com.vmware.toragent.tormgr.lib.TorInorderMessageProcessor$TorInorderQueueElement.processElement(TorInorderMessageProcessor.java:261) at com.vmware.toragent.tormgr.lib.QueueElementRunner.run(TorQueueElementProcessor.java:78) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
VMware NSX Data Center for vSphere 6.4.x
VMware NSX Data Center for vSphere 6.3.x
There is an inconsistency between the NSX controllers and the ToR OSVDB, the controller refers to non existent table entry in the ToR OVSDB.
This affects VMs which are on logical switches that are bound to the ToR which the controller is unable to update.
If there are multiple hardware vtep's, only the logical switches that are bound to the affected ToR will be impacted.
The controller fails to update the ToR tables with VM MAC location, this happens when VM is vMotioned from one host to another.
This will impact the datapath for the VM, as the hardware VTEP will not know which host the VM now resides on.
Thus there will be loss of connectivity for VMs which have been vMotioned.
This issue is resolved in NSX Data Center for vSphere 6.4.2.
Workaround:
Detach the controller from the affected ToR and re-attach the controller.
Note: This will impact all network which go through the ToR, so planning for a maintenance window to apply this workaround is recommended.