Virtual machine on NSX-T ESXi transport nodes failed to get DFW rules re-applied after upgrade
search cancel

Virtual machine on NSX-T ESXi transport nodes failed to get DFW rules re-applied after upgrade

book

Article ID: 318621

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Your ESXi host version is less than 7.0 and the NSX-T version is 3.0.0 or greater.
  • You may have recently upgraded NSX-T from version less than 3.0.0 to version 3.0.0 or higher.
  • Workload VM(s) have no DFW rules applied after the upgrade, before the upgrade they had DFW rules applied.
  • In the NSX-T Manager, Security, Distributed Firewall section, we see the rules are present, but at the dataplane level they are missing.
  • To check this on the ESXi host (dataplane), first find the DFW slot 2 filter for impacted VM:
summarize-dvfilter | grep <VM-Name> -A 5 -B 5
  • Then use the slot 2 filter name in the next command as below:
vsipioctl getrules -f nic-XXXXXXXX-eth0-vmware-sfw.2
  • The result of this command is:
No rules.
  • In the ESXi 'vmkernel.log' you see entries like the following:
2021-007-12T12:05:24.848Z cpu120:87115747)Delaying invoking VsipCpReportFilter for filter nic-XXXXXXXX-eth0-vmware-sfw.2 due to ongoing vmotion vM 0 or RTM 1
  • If this is not shortly followed by a 'Restore state called for filter nic-XXXXXXXX-eth0-vmware-sfw.2' then the VM is impacted.



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 2.x
VMware NSX-T Data Center 3.x

Cause

During the upgrade, the DFW module (VSIP) registers with the RTM, to let it know an upgrade is in progress.
When this issue occurs the RTM value does not get reverted back to 0 again once the upgrade is complete.
When an upgrade is completed a timer kicks in and sends an unregister event, this should unregister the VSIP module from RTM and set the RTM value back to 0:

2021-07-12T11:56:43.338Z cpu140:2105832)RTM_ClientClearTimerCB:1524:[nsx@6876 comp="nsx-esx" subcomp="rtm" errorCode="ESX3"]RTM_ClientClearTimerCB: Error acquiring portset DvsPortset-3 : Not found


In the above log sample from the 'vmkernel.log', it was unable to acquire the portset.
As it was unable to acquire the portset, the unregister event was not sent out.
The reason it fails to get the portset lock happens if the portset has changed during the time it found the portset and the time it tried to get the lock.

Resolution

This issue has been resolved with the release of NSX-T 3.1.3.1

Workaround:
Migrate VM to other host.
After the migration the VM should correctly get DFW rules applied again.