VMs lose North bound connectivity through hardware VTEP after vMotion
search cancel

VMs lose North bound connectivity through hardware VTEP after vMotion

book

Article ID: 309670

calendar_today

Updated On:

Products

VMware NSX Data Center for vSphere

Issue/Introduction

Symptoms:


Loss of connectivity is observed for VMs which have been vMotioned.

In the controller toragent.log, this will be the physical primary for the ToR in question, there will be repeated errors like the following:
2019-06-14 12:43:48,054 | DEBUG | nioEventLoopGroup-3-1 | JsonRpcEndpoint | Response : {"id":"1be28e7a-e548-41b2-####-############","result":[{"count":1},{},{"details":"Table Ucast_Macs_Remote column locator row d2e2f641-b925-42d7-####-########## references nonexistent row 22054b92-7c1c-4622-####-######## in table Physical_Locator.","error":"referential integrity violation"}],"error":null}

2019-06-14 12:43:48,155 | ERROR | pool-6-thread-10 | TorInorderMessageProcessor | Error when processing a QueueElement
com.vmware.toragent.tormgr.util.TransactionException: ExceptionMessage: Error updating rowOperationsRequested: 2OperationsExecuted: 3
at com.vmware.toragent.tormgr.southbound.TorClient.executeTransaction(TorClient.java:2012)
at com.vmware.toragent.tormgr.southbound.TorClient.executeTransaction(TorClient.java:1984)
at com.vmware.toragent.tormgr.southbound.UcastMacEvent.updateUcastMacRemoteRow(UcastMacEvent.java:132)
at com.vmware.toragent.tormgr.southbound.UcastMacEvent.processUpdate(UcastMacEvent.java:73)
at com.vmware.toragent.tormgr.southbound.SouthboundEvent.processElement(SouthboundEvent.java:79)
at com.vmware.toragent.tormgr.lib.TorInorderMessageProcessor$TorInorderQueueElement.processElement(TorInorderMessageProcessor.java:261)
at com.vmware.toragent.tormgr.lib.QueueElementRunner.run(TorQueueElementProcessor.java:78)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 

 

Environment

VMware NSX Data Center for vSphere 6.4.x
VMware NSX Data Center for vSphere 6.3.x

Cause

There is an inconsistency between the NSX controllers and the ToR OSVDB, the controller refers to non existent table entry in the ToR OVSDB.

This affects VMs which are on logical switches that are bound to the ToR which the controller is unable to update.
If there are multiple hardware vtep's, only the logical switches that are bound to the affected ToR will be impacted.

The controller fails to update the ToR tables with VM MAC location, this happens when VM is vMotioned from one host to another.
This will impact the datapath for the VM, as the hardware VTEP will not know which host the VM now resides on.
Thus there will be loss of connectivity for VMs which have been vMotioned.

Resolution

This issue is resolved in NSX Data Center for vSphere 6.4.2.

Workaround:
Detach the controller from the affected ToR and re-attach the controller.
Note: This will impact all network which go through the ToR, so planning for a maintenance window to apply this workaround is recommended.

Additional Information

Impact/Risks:
If the issue is encountered, VM's may loss Northbound connectivity after vMotion.