NSX segment backed VM switch port blocked due to stale nsx-opsagent cache
search cancel

NSX segment backed VM switch port blocked due to stale nsx-opsagent cache

book

Article ID: 437027

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • After a VM vMotion, the network adapter (vNIC) appears in a "DISCONNECTED" state. Manual attempts to connect the adapter fail with the following error in vCenter:

Task name: Reconfigure virtual machine Error: Failed to connect virtual device ethernet0

  • In the NSX manager  proton /var/log/proton/nsxapi.log, you may observe a sequence where a detach is successfully processed, but a subsequent attach indicates a prior stale port for the same VIF ID using a different Logical Switch

  • Detach from source host for vmotion:

INFO L2TaskExecutor4 LogicalPortServiceImpl 77790 SWITCHING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] dvport [##########################] to be detached on host: [##########################]
INFO L2TaskExecutor4 VifAttachmentRpcHandler 77790 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] VifMsgHandler.END: processed Vif operation [112-35] (msgId=null:null), (msg=message_id: ""
operation: DETACH_VIF_FROM_PORT
type: RESPONSE
port_attachment {
 logical_port_uuid: "##########################"
 vif_uuid: "47f3##########################"
 transport_zone_uuid: "##########################"
  host_operation_id: "112-35"
 logical_switch_uuid: "8f46e##########################"
}

  • Attach received from source host for attach to LS 7bc39b##########################:

INFO L2TaskExecutor20 VifAttachmentRpcHandler 77790 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] VifMsgHandler.BEGIN: Received VifMsg [null:null]: "operation: ATTACH_VIF_TO_PORT
type: REQUEST
vif_attachment {
 vif_uuid: "47f3##########################"
 logical_switch_uuid: "7bc39b##########################"
  logical_port_uuid: ""
 host_id: "6c087##########################"
 vmx_path: "/vmfs/volumes/<redacted>"
  host_operation_id: "112-36"
}

Environment

VMware NSX 

Cause

This issue is caused by stale cached data within the `nsx-opsagent` service on the ESXi host. This data is left behind from past cross-logical switch migrations (e.g., when changing the VM's logical segment). When the VM is subsequently migrated, this incorrect cache is relayed to the NSX Manager, which deletes the logical port and blocks the port that the vNIC is connected to on the destination host.

Resolution

Temporary Workaround:

  • To immediately restore connectivity, perform a vMotion of the impacted VM to another host in the cluster. This allows the network adapter to connect successfully on the new destination host.

Permanent Mitigation (Clearing Cache):

  • To permanently resolve the problem on the affected host and prevent recurrence during migration, the nsx-opsagent service must be restarted on the ESXi host where the VM has resumed connectivity and is in a "good" state.

Note: Restarting the nsx-opsagent service does not require host downtime and should not cause any network disruptions to running virtual machines.

/etc/init.d/nsx-opsagent restart

Important Considerations:

  • Recurrence: This is a rare scenario that occurs when the vm cross logical switch reconfigure / migration fails to process the detach operation from the source logical switch.
  • Proactive Maintenance: In such instances, proactively restart the nsx-opsagent service on the host while the VM is in a good state to clear any new stale data.
  • Cluster-wide Mitigation: It is recommended to restart the nsx-opsagent service on all ESXi hosts within the cluster to ensure no other stale data persists.

Permanent Fix : A permanent resolution is planned for a future NSX release.