All IPsec VPN sessions behind a Local Endpoint go Down with error message: "Local Endpoint IP not bound to interface".
book
Article ID: 397403
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
After making a configuration change to a logical-router (adding a new segment/network, adding a new Service IP, etc....) you notice that all IPsec VPN sessions that were configured to use a particular local endpoint on a service interface attached to the logical-router where the aforementioned configuration change had been made, now show as being Down.
Checking the session and tunnel status within the NSX UI reveals the DOWN reason as being: "Local Endpoint IP not bound to interface".
Environment
VMware NSX-T Data Center >= 3.2.1 VMware NSX 4.x
Cause
When a Service IP being used as an IPsec VPN local endpoint is created in an NSX-T environment running a version older than 3.2.1.x, then gets upgraded to 3.2.1.x, a migration of data occurs between a now-unused corfu table and a newly-created corfu table called: "LrServiceIpInternal". This move to the new corfu table must take place at the 3.2.1.x (or later) version, due to an optimization made to the Service IP framework which made the older table incompatible.
During the data migration to the new "LrServiceIpInternal" corfu table, some of the Service IPs configured on both Tier-1 and Tier-0 GW's have ended up missing the value for the payload object "lrResourceId". This should be populated by the UUID of the logical router where the Service IP is configured. See below for an example of a "LrServiceIpInternal" table entry that is missing this data:
Payload: { "lrResourceId": { <----- lr resource id should not be empty }, "l3EntityId": { "l3EntityId": { "left": "77####208896####540", "right": "13####8118597####252" }, "l3EntityType": "L3_ENTITY_TYPE_IPSEC_LOCAL_EDP" }, "serviceIp": [{ "ipAddressConfig": { "ipv4": 3##95###0 }, "serviceEntityConfig": { "operationMask": "4##4" } }] }
Metadata: { "createTime": "1###8639###07", <----- Time stamp of upgrade from version less than 3.2.1 to version 3.2.1.x "createUser": "system", "lastModifiedTime": "1###8639###07", "lastModifiedUser": "system", "productVersion": "3.2.1.x" <------- Record showing the "LrServiceIpInternal" table entry was created as a result of the corfu table migration that occurred during the upgrade to 3.2.1.x }
Because the "lrResourceId" value was empty for the Service IP being used as a IPsec VPN local endpoint, whenever a re-realization of the same logical router gets triggered, it will clear this Service IP from the port, resulting in the "Endpoint IP not bound to interface" error message.
Almost any configuration change to the logical router will result in a "re-realization" event. Configuring a new VPN session or adding a new Segment and connecting it to the logical router are two examples of actions that will result in a "re-realization".
This issue can affect logical routers that have been upgraded one or more times AFTER being upgraded to 3.2.1.x, since the "lrResourceId" value within the affected corfu table will remain empty through these upgrades.
Resolution
Resolution:
VMware NSX-T Data Center 3.2.2
When upgrading from versions less than 3.2.1.x, skipping 3.2.1.x and upgrading to version 3.2.2 and above will allow you to avoid this bug.
Workaround:
To avoid any such mishap occurring after a future "re-realization" event with a logical router, users can pre-emptively run a reprocess API post upgrade (to version 3.2.1.x) on their logical routers, so that the Service IPs are recalculated and the corfu table entry is fixed. Below is the API to perform the reprocess:
POST https://{{managerIP}}/api/v1/logical-routers/<logical-router-id>?action=reprocess