In Security Only NSX deployments, any of the following may be experienced:
1. Slot 2 dvfilter is still applied for VM's in the "User Excluded Groups" exclusion list:
From SSH in ESXi host, issue the "summarize-dvfilter" command
world 12398178 vmm0:<VM name> vcUuid:'50 2b 61 b1 d9 05 a2 f6-0b ec 51 f5 b1 58 9f ad'
port 100663396 <VM name>.eth0
vNic slot 2
name: nic-12398178-eth0-vmware-sfw.2 <----- Slot 2 is present. This VM is in exclusion list.
agentName: vmware-sfw
state: IOChain Attached
2. Some VM's may have a slot 2 dvfilter, but no rules applied:
world 12671939 vmm0:<VM name> vcUuid:'50 2b c0 0d aa 1b af ad-18 01 28 be 02 1f 2a d3'
port 100663408 <VM name>.eth0
vNic slot 2
name: nic-12671939-eth0-vmware-sfw.2 <--------- Slot 2 is present
/bin/vsipioctl getrules -f nic-12671939-eth0-vmware-sfw.2
No rules. <--------- No rules applied to VM.
3. You may see the following in nsx-syslog.log for impacted VM's:
2023-12-14T18:31:17.756Z nsx-opsagent[2102584]: NSX 2102584 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2103210" level="ERROR" errorCode="MPA44205"] [PortOp] Port set for extraConfig property [com.vmware.port.extraConfig.logicalPort.id] operation failed, error code [bad0003]
For these VM's, you may also see the following in net-dvs -l output:
com.vmware.port.extraConfig.portOp.pending = attach_port , propType = CONFIG
4. VM's may not have logical ports associated with their VIFs:
You may also observe the following in NSX Manager /var/log/proton/nsxapi.log:
<Year>-<Month>-<Day><Time> WARN PolicyClusterResourcesCleanupTaskScheduler1 PolicyResourceChangeNotificationManager 90537 POLICY [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Failure received invoking listener PolicyHostTransportNodeProfileNotificationListener for change DELETING on resource /infra/host-transport-node-profiles/<Transport-node-profile-ID>
com.vmware.nsx.management.policy.transportnodecollection.exception.TransportNodeProfileException: null
at com.vmware.nsx.management.policy.transportnode.util.TransportNodeProfileUtilImpl.validateProfileNotInUse(TransportNodeProfileUtilImpl.java:534) ~[?:?]
at com.vmware.nsx.management.lcm.transportnode.service.TransportNodeProfileServiceImpl.preDelete(TransportNodeProfileServiceImpl.java:625) ~[?:?]
at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.preDelete(PolicyHostTransportNodeProfileNotificationListener.java:135) ~[?:?]
at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.handleResourceChange(PolicyHostTransportNodeProfileNotificationListener.java:81) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:161) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:134) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.deleteAndNotify(PolicyServiceImpl.java:1460) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody18(PolicyServiceImpl.java:736) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure19.run(PolicyServiceImpl.java:1) ~[?:?]
at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96cproceed(AbstractTransactionAspect.aj:67) ~[?:?]
at org.springframework.transaction.aspectj.AbstractTransactionAspect$AbstractTransactionAspect$1.proceedWithInvocation(AbstractTransactionAspect.aj:73) ~[?:?]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.aspects.IntentTransactionAspect.invokeWithinTransaction(IntentTransactionAspect.java:75) ~[?:?]
at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96c(AbstractTransactionAspect.aj:71) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody20(PolicyServiceImpl.java:718) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure21.run(PolicyServiceImpl.java:1) ~[?:?]
at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
at io.micrometer.core.aop.TimedAspect.processWithTimer(TimedAspect.java:119) ~[?:?]
at io.micrometer.core.aop.TimedAspect.ajc$inlineAccessMethod$io_micrometer_core_aop_TimedAspect$io_micrometer_core_aop_TimedAspect$processWithTimer(TimedAspect.java:1) ~[?:?]
at io.micrometer.core.aop.TimedAspect.timedMethod(TimedAspect.java:97) ~[?:?]
at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete(PolicyServiceImpl.java:718) ~[?:?]
at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResources(PolicyClusterResourcesCleanupTaskForSecurity.java:240) ~[?:?]
at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResourcesCreatedForSecurity(PolicyClusterResourcesCleanupTaskForSecurity.java:315) ~[?:?]
at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.cleanupStaleSecurityResources(PolicyClusterResourcesCleanupTaskForSecurity.java:136) ~[?:?]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_372]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_372]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_372]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]
NSX Security Only deployment
Impacted Versions: NSX 4.1.0 - 4.1.2.3
If the customer has a DVS that spans both NSX Security Only-prepared clusters, and non-prepared clusters, this issue can be triggered during a race condition. When the cleanup task is triggered on a cluster that is not prepared for NSX yet, it considers that the Transport Node Profile and Transport Zone of other cluster sharing the DVS as stale and tries to delete them along with VM logical ports.
Workaround:
Permanent Fix:
Upgrade to NSX 4.1.2.4 or later. These versions allow for a DVS to span both NSX Security Only clusters and non-prepared clusters.