Logical ports missing from VM's in NSX security only cluster
search cancel

Logical ports missing from VM's in NSX security only cluster

book

Article ID: 367122

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

In Security Only NSX deployments, any of the following may be experienced:

 

1. Slot 2 dvfilter is still applied for VM's in the "User Excluded Groups" exclusion list:

From SSH in ESXi host, issue the "summarize-dvfilter" command

world 12398178 vmm0:<VM name> vcUuid:'50 2b 61 b1 d9 05 a2 f6-0b ec 51 f5 b1 58 9f ad'

port 100663396 <VM name>.eth0

vNic slot 2

name: nic-12398178-eth0-vmware-sfw.2 <----- Slot 2 is present. This VM is in exclusion list.

agentName: vmware-sfw

state: IOChain Attached

2. Some VM's may have a slot 2 dvfilter, but no rules applied:

world 12671939 vmm0:<VM name> vcUuid:'50 2b c0 0d aa 1b af ad-18 01 28 be 02 1f 2a d3'
port 100663408 <VM name>.eth0
vNic slot 2
name: nic-12671939-eth0-vmware-sfw.2    <---------  Slot 2 is present

/bin/vsipioctl getrules -f nic-12671939-eth0-vmware-sfw.2

No rules.   <---------  No rules applied to VM.

3. You may see the following in nsx-syslog.log for impacted VM's:

2023-12-14T18:31:17.756Z nsx-opsagent[2102584]: NSX 2102584 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2103210" level="ERROR" errorCode="MPA44205"] [PortOp] Port set for extraConfig property [com.vmware.port.extraConfig.logicalPort.id] operation failed, error code [bad0003]

For these VM's, you may also see the following in net-dvs -l output:

com.vmware.port.extraConfig.portOp.pending = attach_port , propType = CONFIG

4. VM's may not have logical ports associated with their VIFs:

    1. Log into NSX UI
    2. Browse to Inventory > Virtual Machines
    3. Search for VM and select "VIEW DETAILS" under Virtual Interface
    4. Observe that the Port is missing

 

You may also observe the following in NSX Manager /var/log/proton/nsxapi.log: 

 

<Year>-<Month>-<Day><Time>  WARN PolicyClusterResourcesCleanupTaskScheduler1 PolicyResourceChangeNotificationManager 90537 POLICY [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Failure received invoking listener PolicyHostTransportNodeProfileNotificationListener for change DELETING on resource /infra/host-transport-node-profiles/<Transport-node-profile-ID>
com.vmware.nsx.management.policy.transportnodecollection.exception.TransportNodeProfileException: null
 at com.vmware.nsx.management.policy.transportnode.util.TransportNodeProfileUtilImpl.validateProfileNotInUse(TransportNodeProfileUtilImpl.java:534) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.TransportNodeProfileServiceImpl.preDelete(TransportNodeProfileServiceImpl.java:625) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.preDelete(PolicyHostTransportNodeProfileNotificationListener.java:135) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.handleResourceChange(PolicyHostTransportNodeProfileNotificationListener.java:81) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:161) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:134) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.deleteAndNotify(PolicyServiceImpl.java:1460) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody18(PolicyServiceImpl.java:736) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure19.run(PolicyServiceImpl.java:1) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96cproceed(AbstractTransactionAspect.aj:67) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect$AbstractTransactionAspect$1.proceedWithInvocation(AbstractTransactionAspect.aj:73) ~[?:?]
 at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.aspects.IntentTransactionAspect.invokeWithinTransaction(IntentTransactionAspect.java:75) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96c(AbstractTransactionAspect.aj:71) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody20(PolicyServiceImpl.java:718) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure21.run(PolicyServiceImpl.java:1) ~[?:?]
 at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.processWithTimer(TimedAspect.java:119) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.ajc$inlineAccessMethod$io_micrometer_core_aop_TimedAspect$io_micrometer_core_aop_TimedAspect$processWithTimer(TimedAspect.java:1) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.timedMethod(TimedAspect.java:97) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete(PolicyServiceImpl.java:718) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResources(PolicyClusterResourcesCleanupTaskForSecurity.java:240) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResourcesCreatedForSecurity(PolicyClusterResourcesCleanupTaskForSecurity.java:315) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.cleanupStaleSecurityResources(PolicyClusterResourcesCleanupTaskForSecurity.java:136) ~[?:?]
 at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[?:?]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_372]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_372]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_372]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_372]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_372]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_372]
 at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]

Environment

NSX Security Only deployment

Impacted Versions: NSX 4.1.0 - 4.1.2.3

Cause

If the customer has a DVS that spans both NSX Security Only-prepared clusters, and non-prepared clusters, this issue can be triggered during a race condition. When the cleanup task is triggered on the cluster that is not prepared for NSX yet, it considers that the TNP and TZ of other cluster sharing the DVS as stale and tries to delete them along with VM logical ports. 

 

Resolution

Workaround:

  1. Ensure that the DVS spans only the Security Only-prepared clusters, and use a separate DVS for the other non-prepared clusters.
  2. Once the DVS span is adjusted, move all impacted VM's to another Distributed Portgroup (dvpg), then back to the original dvpg. You can create and use temporary dvpg's with the same VLAN's to minimize the duration of dataplane impact. After this, the logical ports should be present on the VM's and the issues should be resolved. 

 

Permanent Fix:

Upgrade to NSX 4.1.2.4 or later. These versions allow for a DVS to span both NSX Security Only clusters and non-prepared clusters.