After rebooting a NSX Manager there are reports that VM's have no network connectivity due to a blocked port
search cancel

After rebooting a NSX Manager there are reports that VM's have no network connectivity due to a blocked port

book

Article ID: 400959

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In the NSX UI it can be seen that the Manager cluster is in a degraded state.
  • VM's migrated\vmotion's since rebooting NSX Managers have ports reported as being in a "Blocked" state. This can be seen in the vSphere client: Networking -> vDS Name -> Ports
  • Log lines similar to the below are encountered on the vMotion destination ESXi host in /var/run/log/vmkernel.log
2025-04-17T12:52:25.173Z cpu62:2098533)NetPort: 3054: blocking traffic on DV port <PORT UUID>
2025-04-17T12:52:25.173Z cpu62:2098533)kcp: KCP_DeletePort:761: [nsx@6876 comp="nsx-esx" subcomp="kcp"]Port <ID> is cleared and blocked
  • In the host /var/run/log/nsx-syslog it can be seen that attaching the vNIC is failing as a timeout to the Management Plane is being reported:
2025-04-17T12:51:50.324Z nsx-opsagent[2103799]: NSX 2103799 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="<ID>" level="ERROR" errorCode="MPA41542"] >[MP_AddVnicAttachment] RPC call [lro-################] to NSX management plane timout
2025-04-17T12:51:50.324Z nsx-opsagent[2103799]: NSX 2103799 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="<ID>" level="ERROR" errorCode="MPA42003"] [DoMpVifAttachRpc] MP_AddVnicAttachment() failed: RPC call to NSX management plane timeout
  • After the reboot of the Manager in the /var/log/nvpapi/api_server logs, it can be seen that the API server is waiting for the cluster to become stable:
2025-04-17T11:29:33.459Z napi.root.node.backup_restore INFO Waiting for management cluster to become stable
2025-04-17T11:29:36.385Z napi.root.node.backup_restore INFO REPEATS: 4 repeats in 2 sec: Waiting for management cluster to become stable

2025-04-17T11:29:38.461Z napi.root.node.backup_restore INFO Manager not available, reading restore state information from file: /home/secureall/secureall/backup/cluster/state.json
2025-04-17T11:29:38.462Z napi.root.node.backup_restore ERROR Restore status file not found: /home/secureall/secureall/backup/cluster/state.json
  • In the Manager /var/log/proton/nsxapi.log it can be seen that the proton JVM on the unhealthy NSX Manager node failed to start due to a failed 'DiscoveredNodeStreamListener' Java bean load error:
2025-04-17T11:49:48.169Z ERROR localhost-startStop-1 ApplicationContextManager 4441 - [nsx@6876 comp="nsx-manager" errorCode="MP2112" level="ERROR" subcomp="manager"] Failed to start management application.org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'discoveredNodeStreamListener' defined in URL [jar:file:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/libpolicy-tn-deployment.jar!/com/vmware/nsx/management/policy/transportnodecollection/handler/DiscoveredNodeStreamListener.class]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.vmware.nsx.management.policy.transportnodecollection.handler.DiscoveredNodeStreamListener]: Constructor threw exception; nested exception is org.springframework.context.NoSuchMessageException: No message found under code 'errorcode.defaultErrorMessage' for locale 'en_US'. 

Environment

VMware NSX

VMware NSX-T

Cause

  • The reason why the VM attach request is not redirected to one of the healthy managers is due to the fact that even though the proton service failed to start and, from the cluster perspective, the proton on the node is down, the JVM (Java process) did not get killed completely.
  • Due to this, the NSX RPC server running on the old Proton service is still up, so hosts are able to establish connections to this Proton and send messages to the failed node. But those incoming requests will not be processed since the Proton service is not available, and due to this, there is a failure to respond.

Resolution

Fix in version 4.2.X, which prevents the DiscoveredNodeStreamListener bean creation from causing the Proton JVM to fail to fully start up.

Workaround:

  • Power off the unhealthy NSX Manager node and vMotion the VM's with blocked ports.
  • Replace the faulty mode using the following procedure:

Replacing a faulty NSX-T manager node in a VCF environment