VM has no network connectivity due to a blocked port after power on or vmotion
search cancel

VM has no network connectivity due to a blocked port after power on or vmotion

book

Article ID: 391622

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX UI may display Cluster Degraded alarm
  • Newly created VMs, or recently migrated/vmotion'd VMs do not connect to the network
  • On the vSphere Client, Networking -> vDS Name -> Ports, the impacted VM port is in a "Blocked" state.
    Log lines similar to the below are encountered on the ESXi host in /var/run/log/vmkernel.log
    In(182) vmkernel: cpu81:126647407 opID=52cdc12d)kcp: KCP_DeletePort:958: [nsx@6876 comp="nsx-esx" subcomp="kcp"]Port ###### is cleared and blocked
    Output similar to the below is seen on the ESXi host in the output of net-dvs -l
            port ########-####-####-####-############:
                  com.vmware.common.port.volatile.status = inUse linkUp blocked portID=###### Port blocked by admin propType = RUNTIME
  • New configuration changes, such as segment updates, policy updates, etc, are delayed or blocked.
  • vMotion of a VM may be blocked with an error on the vSphere client:
    "Currently connected network interface" 'Network adapter X' uses network 'DVSwitch[50 3e ## ## ## ## ## ##-## ## ## ## ## ## 42 4e] NSX port group [dvportgroup-#####](nsxa down)', which is not accessible."
  • The connection from ESXi hosts to the Central Control Plane may briefly flap between two NSX Manager nodes during resharding, as part of the automatic recovery.
    Log lines similar to the below are encountered on the ESXi host in /var/run/log/vmkernel.log
    In(182) vmkernel: cpu1:2176110)vdl2: VDL2CPProcessLinkChange:6889: [nsx@6876 comp="nsx-esx" subcomp="vdl2-####"]Control plane link down[IP: ###.###.##.##] for VNI[####]
  • This condition may occur in two ways.

    • Scenario #1: The Controller service transaction processing thread is blocked but automatically self-recovers after 2 hours (7200 seconds).
      Log lines similar to the below are encountered on the NSX Manager in /var/log/cloudnet/nsx-ccp.log
      ERROR FalconThread-0 AbstractDependencyBasedDataDiscoverer 74130 - [nsx@6876 comp="nsx-controller" errorCode="CCP1310211" level="ERROR" subcomp="magpie"] Parallel invocation of features encountered error with concurrent listeners: {}
      java.util.concurrent.TimeoutException: Shutdown timer hit after 7200 seconds
       Log lines similar to the below are encountered on the NSX Manager in /var/log/cloudnet/nsx-ccp-events.log
      EVENT WrapperSimpleAppMain Main 2200380 - [nsx@6876 comp="nsx-controller" level="EVENT" subcomp="main"] CCP process started
    • Scenario #2: The Controller service service transaction processing thread is blocked indefinitely and does not self-recover.
      No log lines containing "ForkJoinPool.commonPool" are seen for 30 mins or longer on the NSX Manager in /var/log/cloudnet/nsx-ccp.log
      INFO ForkJoinPool.commonPool-worker-63 ShardingManagerImpl 87947 - [nsx@4413 comp="nsx-controller" level="INFO" subcomp="magpie"] Notify listeners for sharding update with revision ########

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.2.0.x
VMware NSX 4.2.1.x

Cause

Due to an issue in JDK (JDK-8330017), Java ForkJoinPool may incorrectly determine the total number of ForkJoinPool threads as over the limit and new thread requests may be blocked, which results in the NSX Controller transaction processing thread being blocked.
There are two possible scenarios:

  • Scenario #1: The Controller service is impacted however it auto restarts after 2 hours to self recover. VMs requesting a new network connection after a vmotion or power on are impacted during this 2 hour window.
  • Scenario #2: ForkJoinPool.commonPool may become blocked and the Controller service cannot recover without a manual restart. VMs requesting a new network connection after a vmotion or power on are impacted until the issue is manually resolved.

Note, this issue is expected to repeat based on the uptime of the Controller service. Medium form factor Managers can experience the issue after 6 weeks and Large/Extra Large form factor Managers after more than 3 months.

Resolution

For resolution and workaround, refer to the parent article that consolidates guidance regarding this issue: NSX is impacted by JDK-8330017: ForkJoinPool stops executing tasks due to ctl field Release Count (RC) overflow.