VMware NVDS to CVDS migration hung: Uplink xxx-lag-0 is not found in target DVS
search cancel

VMware NVDS to CVDS migration hung: Uplink xxx-lag-0 is not found in target DVS

book

Article ID: 322643

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • You are preforming an NVDS to CVDS migration on VMware NSX 3.2.x or lower.
  • When migrating a host using the API POST https://<nsx-mgr>/api/v1/transport-nodes/<tn-id>?action=migrate_to_vds
  • You receive the following ERROR in the NSX-T UI:
Host configuration: MigrateOutofLS(xx xx xx xx 13 b4 42 2b-b5 6d ac d2 xx xx xx xx) failed: [Uplink [bb273-lag-0] is not found in target DVS [xx xx xx xx 60 02 4c-2d 6c f6 aa xx xx xx xx]]; LogicalSwitch full-sync: LogicalSwitch full-sync realization query skipped.
  • This host has vmnic's configured in a lag.
  • NSX-T manager /var/log/proton/nsxapi.log shows:
ERROR MigrateToCvdsTaskExecutor18 MigrateToCvdsTask 71579 FABRIC [nsx@6876 comp="nsx-manager" errorCode="PM250050" level="ERROR" subcomp="manager"] MigrateToCvdsTask on host [xxxxxxxx-ab22-4890-9c0c-xxxxxxxxxxxx] failed. Current stage TN_UPDATE_MIGRATE_WAIT, Aborting all remaining stages.
...
ERROR L2HostConfigTaskExecutor5 TransportNodeAsyncUtils 71579 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP8817" level="ERROR" subcomp="manager"] Some error occured when configuring host switch on host: MigrateOutofLS(xx xx xx xx 44 8e 4e ca-a4 b5 eb 20 xx xx xx xx) failed: [socket read failed];
INFO L2HostConfigTaskExecutor5 TransportNodeAsyncServiceImpl 71579 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Set auto-retry interval to max after host xxxxxxxx-ab22-4890-9c0c-xxxxxxxxxxxx reported vmk/pnic migration failure 2 times
ERROR L2HostConfigTaskExecutor5 TransportNodeAsyncServiceImpl 71579 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP8700" level="ERROR" subcomp="manager"] MigrateOutofLS(xx xx xx xx 44 8e 4e ca-a4 b5 eb xx xx xx xx) failed: [socket read failed];
com.vmware.nsx.management.switching.common.exceptions.SwitchingException: null
...
  • On ESXi host /var/run/log/nsx-syslog.log:
nsx-opsagent[2101330]: NSX 2101330 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2102040" level="ERROR" errorCode="MPA42061"] [InvokeCommandInternal] Reading reply from socket failed. Restarting python process..

nsx-opsagent[2101330]: NSX 2101330 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2102040" level="INFO"] [setNsxaVimNotReady] Updating nsxa health to down on nsxavim shutdown

nsx-opsagent[2101330]: NSX 2101330 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2102040" level="ERROR" errorCode="MPA42056"] Command : getVmksInLogicalSwitchesAndVmksOfGivenTypes(management) | Expected reply : ok | Reply : ok|vmk0,xxxxxxxx-1d3e-4f53-9408-xxxxxxxxxxxx,xxxxxxxx-1ea0-4297-a072-xxxxxxxxxxxx,xx xx xx xx 44 8e 4e ca-a4 b5 eb 20 xx xx xx xx,xxxxxxxx-67f9-4f4c-8ff0-xxxxxxxxxxxx,management,,|vmk1,xxxxxxxx-7284-4fc5-bbdb-xxxxxxxxxxxx,xxxxxxxx-1c66-45ed-a9d4-xxxxxxxxxxxx,xx xx xx xx 44 8e 4e ca-a4 b5 eb 20 xx xx xx xx,xxxxxxxx-247f-48c6-88e3-xxxxxxxxxxxx,,,|
  • On ESXi host /var/run/log/nsxaVim.log
nsxaVim: [2435427]: INFO MigrateToProxySwitch(xx xx xx xx 59 68 27 0a-67 6c 2a 8f xx xx xx xx): created portData with key h-1 and conCookie 2 for vmk1^@
nsxaVim: [2435427]: INFO MigrateToProxySwitch(xx xx xx xx 59 68 27 0a-67 6c 2a 8f xx xx xx xx): created portData with key h-2 and conCookie 3 for vmk0^@
nsxaVim: [2435427]: INFO MigrateToProxySwitch vmk2DvpgActiveUplinksMap {'vmk1': ('vmotion_dvs-123456', ['host-lag'], True), 'vmk0': ('management_dvs-123456', ['host-lag'], True)}^@
nsxaVim: [2435427]: INFO MigrateToProxySwitch: vmksMigrate2AnyPnic ['vmk1', 'vmk0']^@
nsxaVim: [2435427]: ERROR Server: got Exception: Traceback (most recent call last): File "/usr/lib64/vmware/nsx-opsagent/pyvim/nsxa/nsxaVim.py", line 3996,
...
AddToActiveUplinkKey2VmksMap uplink_key = dvsPortsInfo.uplinkPortName2KeyMap[uplinkname] KeyError: 'host-lag'^@ 



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x
VMware NSX-T Data Center 3.x

Cause

This issue occurs during NVDS to CVDS migration of the NVDS switch, where lag is configured to use vmks. When the lag is configured on a switch, then it should have lag name with its indices as active uplink names. Here since the active uplink is tried to access only with the lag name without the indices, this results in migration failure.

Resolution

This is a known issue impacting VMware NSX.

Workaround:
To workaround this issue, you can migrate the vmks to a standard vSwitch, do the NVDS to CVDS migration and then migrate the vmks back to the CVDS.
Note: The vSwitch will need to have at least 1 vmnic on it. If you do not have a vSwitch, please create it, add a vmnic and follow the below steps.

For an overview of the steps below, please review the vSphere Networking Guide.

Stage 1: Migrate the vmks from the NVDS to the vSwitch one by one with the below steps.
1. Locate the host in the vCenter UI, navigate to Configure - Virtual Switches on the host.
2. Expand the vSwitch and click the 3 dots, then click on Migrate VMkernel Adapter.
3. Select vmk0 and click Next, in Configure settings screen, click Next and then On Ready to complete click Finish.
4. This may take some time to migrate vmk0 to the vSwitch.
Repeat steps 2 to 4 for remaining vmks to migrate from the NVDS to the vSwitch.

Stage 2: Retrigger migration for host with below steps
1. Clean up the old topology by running the below API:

POST https://<nsx_manager_ip>/api/v1/nvds-urt?action=cleanup

2. Create a new precheck with the below API and note the precheck ID from the output:

POST https://<nsx_manager_ip>/api/v1/nvds-urt/precheck

3. Generate a new URT topology with the below API, using the precheck ID from step 2:

GET https://<nsx_manager_ip>/api/v1/nvds-urt/topology/<precheck_id>

4. Apply the topology using the below API with the payload that was received as output from step 3. Make sure to change vds_name before triggering apply topology.

POST https://<nsx_manager_ip>/api/v1/nvds-urt/topology?action=apply

5. Retrigger the migration for the host using the below API:

POST https://<nsx_manager_ip>/api/v1/transport-nodes/<tn_id>?action=migrate_to_vds

6. Check if upgrade stage is changed to TN_MIGRATION_COMPLETED after all the migration steps.

GET https://{{server_ip}}/api/v1/nvds-urt/status-summary/{{Precheck_id}}


Stage 3: Migrate the vmks and vmnics from the vSwitch to CVDS switch, in order to again use the lag uplinks, with the below steps.
Note: In the below example, the names of vmks, portgroup, CVDS and numbers on vmnics may be different in your setup.
As the NVDS migration is complete, any vmnics used by the NVDS will now be free, as the NVDS has been removed.

Now migrate vmks from vswitch to CVDS switch with below steps:
1. Locate the host in the vCenter UI, navigate to Configure - Virtual Switches on the host.
2. Expand the CVDS and click on the 3 dots, click on Migrate Networking.
3. Under Manage physical adapters, for the first vmnic which will be in the lag, select host-lag-0 under Assign Uplink and click Next.
4. Under Manage VMkernel adapters, for vmk0 click on ASSIGN PORT GROUP, then select management. Make sure that NSX port group ID exists for that entry. Then click on Action Assign and click Next.
5. Under the Migrate VM networking screen, click Next and then click Finish. With this vmnic0 and vmk0 gets migrated to CVDS switch to use host-lag-0 uplink
6. Perform the same operation starting from step 3 to step 5 to migrate vmnic1 and vmk1 to the CVDS switch by selecting port group vmotion to use host-lag-1 uplink.