Firewall rules missing from NSX environment due to RTM not unloading.
search cancel

Firewall rules missing from NSX environment due to RTM not unloading.

book

Article ID: 373932

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

NSX is deployed in the environment.

Firewall rules are not taking affect on some VMs.

Checking the VM filter live on the ESXi using vsipioctl getrules -f <filter_name> reports "no_rules" and in vsipioctl_info.sh.txt via a support bundle the filters without rules are not referenced.

The RTM module used during the upgrade process is still loaded. This can be confirmed live on the ESXi via CLI command  vmkload_mod -l | grep -i rtm or via a support bundle in /commands/vmkload_mod_-v10--l.txt 

Example output if loaded: nsxt-rtm-123456789 . If not loaded there will be no output.

To workaround the problem, vMotion VMs to a host without the issue.

Environment

NSX 4.X

Cause

This is caused by the RTM module not unloading. The vMotion is waiting for RTM to restore state however RTM is reporting no state to restore, where it should be expected restore state from dvfilter:

In vmkernal.log:

2024-07-04T08:39:46.345Z In(182) vmkernel: cpu64:76005444)Delaying invoking VsipCpReportFilter for filter nic-987654321-eth0-vmware-sfw.2 due to ongoing vmotion vM 0 or RTM 1

2024-07-04T08:39:46.732Z In(182) vmkernel: cpu64:71751558)rtmRestore:270:[nsx@6876 comp="nsx-esx" subcomp="rtm"]PortID 0x4000064 not part of RTM hash. SKIP RTM Restore

NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Resolution

To resolve this issue, unload the RTM module and reboot affect ESXi using the below steps:

1: Confirm if RTM module is loaded:

vmkload_mod -l | grep -i rtm  >>>> Example output: nsxt-rtm-123456789

Note: The RTM build number may be different in your environment.

2: Unload the RTM module:

vmkload_mod -u nsxt-rtm-123456789
Module nsxt-rtm-123456789 successfully unloaded

3.1: Compose script /tmp/rtm.emt. Replace 123456789 by the build number in #1.


[root@ESXi01:~] more /tmp/rtm.emt 
VMKLoad() {
        unsigned long *x;
        x = sym2addr("nsxt-rtm-123456789.already_called");
        printf("Already_called:%lu\n", *x);
}

3.2: Compose script /tmp/vsip.emt. Replace 123456789 by the build number in #1.


[root@ESXi01:~] more /tmp/vsip.emt 
VMKLoad() {
        unsigned long *x;
        x = sym2addr("nsxt-vsip-123456789.VSIPExpectRestoreRTM");
        printf("RTM:%lu\n", *x);
}


4.1: Use command vprobe /tmp/rtm.emt -d 5. You should see Already_called:1


[root@ESXi01:~] vprobe /tmp/rtm.emt -d 5
Already_called:1
OK.


4.2: Use command vprobe /tmp/vsip.emt -d 5. You should see RTM:0 or RTM:1.

[root@ESXi01:~] vprobe /tmp/rtm.emt -d 5
RTM:1
OK.
These results mean as follows:

RTM:0 - this host OK
RTM:1 - this host requires a reboot

5: For hosts with "RTM:1", vMotion all VMs to hosts with "RTM:0". Reboot hosts with "RTM:1". 

6: After reboot, run "vprobe /tmp/vsip.emt -d 5" to verify output is RTM-0.