DFW rules are not matched as expected in NSX Security Only (NSX on DVPG) environments - Logical ports missing from VMs
search cancel

DFW rules are not matched as expected in NSX Security Only (NSX on DVPG) environments - Logical ports missing from VMs

book

Article ID: 367122

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention VMware NSX

Issue/Introduction

In Security Only NSX deployments, any of the following may be experienced:

 

1. Slot 2 dvfilter is still applied for VM's in the "User Excluded Groups" exclusion list:

From SSH in ESXi host, issue the "summarize-dvfilter" command

world 12398178 vmm0:<VM name> vcUuid:'50 ## ## ## ## ## a2 f6-0b ec 51 f5 ## ## ## ad'

port 100663396 <VM name>.eth0

vNic slot 2

name: nic-12398178-eth0-vmware-sfw.2 <----- Slot 2 is present. This VM is in exclusion list.

agentName: vmware-sfw

state: IOChain Attached

 

2. Some VM's may have a slot 2 dvfilter, but no rules applied:

world 12671939 vmm0:<VM name> vcUuid:'50 ## ## ## ## ## a2 f6-0b ec 51 f5 ## ## ## ad'
port 100663408 <VM name>.eth0
vNic slot 2
name: nic-12671939-eth0-vmware-sfw.2    <---------  Slot 2 is present

/bin/vsipioctl getrules -f nic-12671939-eth0-vmware-sfw.2

No rules.   <---------  No rules applied to VM.

 

3. You may see the following in nsx-syslog.log for impacted VM's:

2023-12-14T18:31:17.756Z nsx-opsagent[2102584]: NSX 2102584 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2103210" level="ERROR" errorCode="MPA44205"] [PortOp] Port set for extraConfig property [com.vmware.port.extraConfig.logicalPort.id] operation failed, error code [bad0003]

For these VM's, you may also see the following in net-dvs -l output:

com.vmware.port.extraConfig.portOp.pending = attach_port , propType = CONFIG

 

4. VM's may not have logical ports associated with their VIFs:

    1. Log into NSX UI
    2. Browse to Inventory > Virtual Machines
    3. Search for VM and select "VIEW DETAILS" under Virtual Interface
    4. Observe that the Port is missing

5. The Distributed Portgroup (DVPG) that impacted VM's are connected to may be in a "In Progress" state. When clicking the "In Progress" hyperlink, the message indicates that NSX is attempting to remove the DVPG.

 

 

You may also observe the following in NSX Manager /var/log/proton/nsxapi.log: 

 

<Year>-<Month>-<Day><Time>  WARN PolicyClusterResourcesCleanupTaskScheduler1 PolicyResourceChangeNotificationManager 90537 POLICY [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Failure received invoking listener PolicyHostTransportNodeProfileNotificationListener for change DELETING on resource /infra/host-transport-node-profiles/<Transport-node-profile-ID>
com.vmware.nsx.management.policy.transportnodecollection.exception.TransportNodeProfileException: null
 at com.vmware.nsx.management.policy.transportnode.util.TransportNodeProfileUtilImpl.validateProfileNotInUse(TransportNodeProfileUtilImpl.java:534) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.TransportNodeProfileServiceImpl.preDelete(TransportNodeProfileServiceImpl.java:625) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.preDelete(PolicyHostTransportNodeProfileNotificationListener.java:135) ~[?:?]
 at com.vmware.nsx.management.lcm.transportnode.service.PolicyHostTransportNodeProfileNotificationListener.handleResourceChange(PolicyHostTransportNodeProfileNotificationListener.java:81) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:161) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.dao.PolicyResourceChangeNotificationManager.notify(PolicyResourceChangeNotificationManager.java:134) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.deleteAndNotify(PolicyServiceImpl.java:1460) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody18(PolicyServiceImpl.java:736) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure19.run(PolicyServiceImpl.java:1) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96cproceed(AbstractTransactionAspect.aj:67) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect$AbstractTransactionAspect$1.proceedWithInvocation(AbstractTransactionAspect.aj:73) ~[?:?]
 at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.aspects.IntentTransactionAspect.invokeWithinTransaction(IntentTransactionAspect.java:75) ~[?:?]
 at org.springframework.transaction.aspectj.AbstractTransactionAspect.ajc$around$org_springframework_transaction_aspectj_AbstractTransactionAspect$1$2a73e96c(AbstractTransactionAspect.aj:71) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete_aroundBody20(PolicyServiceImpl.java:718) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl$AjcClosure21.run(PolicyServiceImpl.java:1) ~[?:?]
 at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.processWithTimer(TimedAspect.java:119) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.ajc$inlineAccessMethod$io_micrometer_core_aop_TimedAspect$io_micrometer_core_aop_TimedAspect$processWithTimer(TimedAspect.java:1) ~[?:?]
 at io.micrometer.core.aop.TimedAspect.timedMethod(TimedAspect.java:97) ~[?:?]
 at com.vmware.nsx.management.policy.policyframework.service.PolicyServiceImpl.delete(PolicyServiceImpl.java:718) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResources(PolicyClusterResourcesCleanupTaskForSecurity.java:240) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.handleStaleClusterResourcesCreatedForSecurity(PolicyClusterResourcesCleanupTaskForSecurity.java:315) ~[?:?]
 at com.vmware.nsx.management.policy.dvpg.scheduler.PolicyClusterResourcesCleanupTaskForSecurity.cleanupStaleSecurityResources(PolicyClusterResourcesCleanupTaskForSecurity.java:136) ~[?:?]
 at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[?:?]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_372]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_372]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_372]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_372]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_372]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_372]
 at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]

Environment

NSX Security Only deployment

VMware NSX 4.1.0 through 4.1.2.3

 

Cause

If the customer has a DVS that spans both NSX Security Only-prepared clusters, and non-prepared clusters, this issue can be triggered during a race condition. When the cleanup task is triggered on a cluster that is not prepared for NSX yet, it considers that the Transport Node Profile and Transport Zone of other cluster sharing the DVS as stale and tries to delete them along with VM logical ports. 

 

Resolution

Workaround:

  1. Ensure that the DVS spans only the Security Only-prepared clusters, and use a separate DVS for the other non-prepared clusters.
  2. Once the DVS span is adjusted, move all impacted VM's to another Distributed Portgroup (dvpg), then back to the original dvpg. You can create and use temporary dvpg's with the same VLAN's to minimize the duration of dataplane impact. After this, the logical ports should be present on the VM's and the issues should be resolved. 

 

Permanent Fix:

Upgrade to NSX 4.1.2.4 or later. These versions allow for a DVS to span both NSX Security Only clusters and non-prepared clusters.

  • 4.1.2.4 upgrade will resolve the issue if not already encountered.
  • If the issue has already occurred, the 4.1.2.4 upgrade will prevent any further occurrences. However, to address the existing issue, the recommended workaround should be followed prior to upgrade. After upgrade, the issue will not return. 
  • The 4.2.x release is the better choice if the goal is to not only prevent future occurrences but also fix the existing issue through an upgrade, eliminating the need for a workaround.