Failed to power on VM. Error: The object or item referred to could not be found. (vDS port cannot be found)
search cancel

Failed to power on VM. Error: The object or item referred to could not be found. (vDS port cannot be found)

book

Article ID: 388001

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The issue that correlates to the observed error in the article title section is a symptom of stale logical port attachers that bind logical ports to the vDS in ESXI.
  • This issue coincides with Scripted cleanup of stale logical ports on NSX segments and the scripted solution for this issue may need to be run in conjunction with the script affixed to this KB.

The corresponding logs to identify this issue are as follows:

NSX-T manager: /var/log/proton/nsxapi.log

During processing of hostd-initiated VIF Detach, a VIF attach message (MP_AddVnicAttachment) to MP will be sent as below, creating the stale LogicalPort at NSX MP, but not at host.


2024-10-07T16:32:05.226Z nsx-opsagent[2219843]: NSX 2219843 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2220371" level="INFO"] [DoVifPortOperation] request=[opId:[5063] op:[HOSTD_DETACH_PORT(2)] vif:[########-####-####-####-############] ls:[########-####-####-####-############] vmx:[/vmfs/volumes/vsan:5268df41ee63a87b-################/########-####-####-####-############/wiwk22jboxdr07_replica.vmx] lp:[]]

ESXI host: /var/log/nsx-syslog.log

2024-10-07T16:32:05.290Z nsx-opsagent[2219843]: NSX 2219843 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2220371" level="INFO"] [PortOp] Cleared external id from port [########-####-####-####-############] successfully
2024-10-07T16:32:05.342Z nsx-opsagent[2219843]: NSX 2219843 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2220374" level="INFO"] [NsxaAppRxCallback] Got Message in app_type:[SwitchingVertical]
2024-10-07T16:32:05.342Z nsx-opsagent[2219843]: NSX 2219843 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2220371" level="INFO"] [MP_AddVnicAttachment] RPC call [5063-5070] to NSX management plane completed in [0] sec

2024-10-07T16:55:28.803Z nsx-opsagent[3265162]: NSX 3265162 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="22921014" level="INFO"] [HandlePriorAttachedPort] handling prior attachment for vif: ls:[########-####-####-####-############] lp:[########-####-####-####-############] tz:[########-####-####-####-############]
2024-10-07T16:55:28.803Z nsx-opsagent[3265162]: NSX 3265162 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="22921014" level="WARNING"] [PortOp] Port [########-####-####-####-############] DVSPROP_PORT_VNIC_EXTERNAL_ID not found ... already cleared on previous vif event, error code [bad0003]

Environment

VMware NSX

Cause

Stale LogicalPorts along with stale LogicalPortAttachers have an incorrect host-VMX path mapping and are created during processing of hostd-initiated VIF detachment which is based on internal in-memory cache (_lsSwapMap) maintained by opsagent.

This port is then used for subsequent VIF Attach/detach requests which always results in having one extra/incorrect attacher entry. This leads to incorrect behavior during these VIF requests (such as MP returning new LogicalPort/VIF to opsagent/host when VIF was already specified by Opsagent/host).

This eventually causes connectivity issues for the impacted VMs.

This situation only arises when there is a missing hostd-initiated VIF detach for the VIF/VM, and the Opsagent service within the transport node receives 2 successive VIF attach requests from hostd for the same VIF, which is connected to different LogicalSwitches.

The cache (_lsSwapMap) would be populated at some time with network changes during VIF Attach on logical switch 1, and then VIF attach would be received for connecting to logical switch 2, thereby creating an entry of form <Key=VIF:vmxPath, Value={LS1, L2}, But this entry wouldn't have been cleared as there wouldn't have been corresponding VIF detach on logical switch 1, to clear the entry.

In the subsequent VIF Detach from this specific host for this VIF, a VIF attach would be sent out to MP, creating stale LogicalPort.

Resolution

Workaround:
1. Find out all stale LogicalPort and LogicalPortAttachers as shown below:


a) Get all LogicalPorts and LogicalPortAttachers. This can be done by using these respective APIs


GET /api/v1/logical-ports
GET /api/v1/logical-ports/<lport-id>/state - (Fields of importance are 'transport_node_ids' and 'attachment.attachers.host')


2. For each LogicalPort, read the corresponding LogicalPortAttacher and check if at the host/Tn-id entry, whether this LogicalPort is present.

This can be done by running the command 'net-dvs -l | less' on the host to check if the port exists.

If the port doesn't exist, then the Attacher entry can be deleted

Alternatively, if the hosts are VC-managed, then the presence of LogicalPorts (as DVPorts in VCenter) can be confirmed from VCenter directly.



An SR with VCF NSX should be filed based on the information gathered above to remove any stale logical ports or logical port attacher entries.

Additional Information

If this issue is suspected please open an SR with with VCF NSX team for further investigation and remediation.