[VMC] HCX L2E VMs lose connectivity to source gateway post NSX Manager upgrade
book
Article ID: 429829
calendar_today
Updated On:
Products
VMware Cloud on AWS
Issue/Introduction
During VMC on AWS SDDC upgrade from 1.22v10 to 1.24v5 where NSX Manager gets upgraded from 4.1.0 to 4.1.2, HCX L2E (Layer 2 Extension) VMs on the destination/cloud side with MON (Mobility Optimized Networking) disabled may lose connectivity to the source side gateway.
VMs hosted on the source/connector side would be able to reach the gateway.
HCX L2E VMs on the destination/cloud side would be able to communicate to VMs on same segment at the source/connector side.
Environment
VMware Cloud on AWS
Cause
The affected L2E segments on the HCX destination/cloud side SDDC will have the property "com.vmware.nsx.port.extraConfig.remoteRtr" missing without which HCX will not be able to translate the MAC address. This results the packets to be forwarded to the VDR port instead of the NE (Network Extension) appliance port.
Underlying cause for the property "com.vmware.nsx.port.extraConfig.remoteRtr" missing:-
For all NSX versions lower than 4.1.1, HCX sets the property "com.vmware.nsx.port.extraConfig.remoteRtr" directly on the Logical Switch (NSX Manager Mode entity) while not on the NSX Segment (NSX Policy Mode entity).
Additionally, in NSX version 4.1.0 and lower, certain segments may have their Transport Zone (TZ) being only linked to the NSX Logical Switch while not on the NSX Segment.
To fix this difference, a task named "SegmentTzUpdateMigrationTask" gets executed during NSX Manager Upgrade.
During NSX upgrade from 4.1.0 to 4.1.2, the task "SegmentTzUpdateMigrationTask" gets executed for all segments which do not have their TZ linked in the NSX Segment. Since these segments will not have the property "com.vmware.nsx.port.extraConfig.remoteRtr" set, the task "SegmentTzUpdateMigrationTask" overrides the NSX Logical Switch and removes the property "com.vmware.nsx.port.extraConfig.remoteRtr".
Resolution
If this issue arises, the fix would be to unextend and re-extend the affected HCX L2E segments from the HCX source/connector side (Note: Do not perform this operation from HCX destination/cloud side as that is not supported and may result in issues). This will result in the reinjection of the property "com.vmware.nsx.port.extraConfig.remoteRtr".
If fix is not possible or delayed, a potential workaround is to migrate the VMs temporarily back to the source/connector side.
If RCA analysis/confirmation is required, raise a Wolken case with VMware Cloud on AWS team before applying the fix on all the affected HCX L2E segments (Get Support)
This issue should not arise in SDDC of version 1.24v5 and above (during future upgrades or in steady-state) since from SDDC version 1.24v5 (having NSX version 4.1.2), TZ should be linked to the NSX Segment. Additionally, from NSX version 4.1.1, HCX sets the property "com.vmware.nsx.port.extraConfig.remoteRtr" directly on the NSX Segment which in turn also causes NSX to set the property on the NSX Logical Switch.