Symptoms:
- NSX version installed NSX-T Data Center 3.1.3 and 3.1.3.x
- ESXi Host fails with PSOD "#PF Exception 14 in world xxxx:nsx-cfgagent"
- This issue is observed when bulk vMotions occur in the NSX-T environment, following are some of the probable scenarios:
- Migration of multiple VMs with each VM comprising of multiple vNICs
- Multiple IP sets configured in CIDR form per rule
- Multiple rules containing same IP Sets
- VMs from a non-upgraded NSX-T host migrated to an upgraded NSX-T host
- Above scenarios may lead to PSOD with following Back trace :
Panic Message: @BlueScreen: #PF Exception 14 in world 58524293:nsx-cfgagent IP 0x4180320430b3 addr 0x10
Backtrace:
0x451ac5c1b158:[0x4180320430b3]rn_walktree@(nsxt-vsip-19068435)#<None>+0x5b stack: 0x43261e082c18, 0x451ac5c1b288, 0x43261e082c18, 0x418031fb0890, 0x0
OR
Panic Message: @BlueScreen: #PF Exception 14 in world 99566160:NetWorld-VM- IP 0x41801bf58a33 addr 0x0
Backtrace:
2021-12-01T01:38:16.127Z cpu19:14797741)0x45393ac18ec0:[0x42002a413ef2]pfp_policy_lookup@(nsxt-vsip-18504670)#<None>+0xcbe stack: 0x43314afbd860
2021-12-01T01:38:16.151Z cpu19:14797741)0x45393ac19450:[0x42002a3b3dc3]pf_test_tcp@(nsxt-vsip-18504670)#<None>+0x5ac stack: 0x45dab513d7a8
2021-12-01T01:38:16.175Z cpu19:14797741)0x45393ac1abc0:[0x42002a3bcd87]pf_validate_state@(nsxt-vsip-18504670)#<None>+0x6c0 stack: 0x14
2021-12-01T01:38:16.201Z cpu19:14797741)0x45393ac1af00:[0x42002a3bd29b]pf_validate_session@(nsxt-vsip-18504670)#<None>+0x158 stack: 0x45393ac1af42
2021-12-01T01:38:16.227Z cpu19:14797741)0x45393ac1afd0:[0x42002a3bebe8]pf_test_state_tcp@(nsxt-vsip-18504670)#<None>+0x389 stack: 0x45da00000000
2021-12-01T01:38:16.251Z cpu19:14797741)0x45393ac1b0d0:[0x42002a3c53e7]pf_test@(nsxt-vsip-18504670)#<None>+0x25c4 stack: 0x45393ac1b160
2021-12-01T01:38:16.273Z cpu19:14797741)0x45393ac1b2e0:[0x42002a44bfb7]PFFilterPacket@(nsxt-vsip-18504670)#<None>+0x754 stack: 0x0
2021-12-01T01:38:16.298Z cpu19:14797741)0x45393ac1b5b0:[0x42002a372dd3]VSIPDVFProcessPacketsInt@(nsxt-vsip-18504670)#<None>+0x450 stack: 0x0
2021-12-01T01:38:16.324Z cpu19:14797741)0x45393ac1bc10:[0x420028f353b6][email protected]#v2_8_0_0+0xa3 stack: 0x1
2021-12-01T01:38:16.346Z cpu19:14797741)0x45393ac1bc50:[0x420028608cbd]IOChain_Resume@vmkernel#nover+0x2e6 stack: 0x43057b6ab3f0
2021-12-01T01:38:16.365Z cpu19:14797741)0x45393ac1bcf0:[0x42002864c946]Port_InputResume@vmkernel#nover+0xbf stack: 0x2
2021-12-01T01:38:16.386Z cpu19:14797741)0x45393ac1bd40:[0x4200286b8b9d]Vmxnet3VMKDevTQDoTx@vmkernel#nover+0xeca stack: 0x80
2021-12-01T01:38:16.407Z cpu19:14797741)0x45393ac1bee0:[0x4200286c1e5b]Vmxnet3VMKDev_AsyncTx@vmkernel#nover+0xb0 stack: 0x330
2021-12-01T01:38:16.427Z cpu19:14797741)0x45393ac1bf50:[0x4200286378a0]NetWorldPerVMCB@vmkernel#nover+0x5b9 stack: 0x0
2021-12-01T01:38:16.447Z cpu19:14797741)0x45393ac1bfe0:[0x420028781e69]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
2021-12-01T01:38:16.467Z cpu19:14797741)0x45393ac1c000:[0x4200284c2c23]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
- VMKernel log file /var/run/log/vmkernel.log on ESXi host will show similar to below entries :
2022-01-05T17:17:05.259Z cpu39:2098345)ImportStateTLV entry type 12, len 52, cnt 1
2022-01-05T17:17:05.259Z cpu39:2098345)Importing from source version RELEASEbuild-19068435
2022-01-05T17:17:05.259Z cpu39:2098345)ImportStateTLV entry type 1, len 566915, cnt 2081
2022-01-05T17:17:05.259Z cpu39:2098345)pfr_unroute_kentry: delete failed.
2022-01-05T17:17:05.260Z cpu39:2098345)pfr_unroute_kentry: delete failed.
2022-01-05T17:17:05.260Z cpu39:2098345)pfr_unroute_kentry: delete failed.
2022-01-05T17:17:05.260Z cpu39:2098345)pfp_add_table_one_addr: failed to add ke
rn_addmask: mask impossibly already in tree2022-01-05T17:17:05.262Z cpu4:26296491)pfp_add_addr_with_rule: failed
2022-01-05T17:17:05.262Z cpu4:26296491)pfp_add: failed for dst
2022-01-05T17:17:05.262Z cpu4:26296491)pfp_del_addr_with_rule: cannot find matching entry flags 2
2022-01-05T17:17:05.262Z cpu4:26296491)pfp_del_port: fpp NULL, port 443, flags 8
2022-01-05T17:17:05.262Z cpu4:26296491)pfp_del_ruleid: rule not found 26238 rs 1
2022-01-05T17:17:05.262Z cpu4:26296491)pfioctl: failed to add rules (0)
2022-01-05T17:17:05.262Z cpu4:26296491)VSIPConversionCreateRuleSet: Cannot insert #1060 rule 26238: 22
2022-01-05T17:17:05.341Z cpu39:2098345)ImportStateTLV entry type 2, len 2086977, cnt 3
2022-01-05T17:17:05.341Z cpu4:26296491)pf_rollback_rules: rs_num: 1, anchor: mainrs
2022-01-05T17:17:05.342Z cpu4:26296491)pf_rollback_rules: rs_num: 2, anchor: mainrs
2022-01-05T17:17:05.342Z cpu4:26296491)pf_rollback_rules: rs_num: 4, anchor: mainrs
2022-01-05T17:17:05.342Z cpu4:26296491)pf_rollback_rules: rs_num: 5, anchor: mainrs
2022-01-05T17:17:05.342Z cpu4:26296491)pf_rollback_rules: rs_num: 6, anchor: mainrs
2022-01-05T17:17:05.342Z cpu39:2098345)configured filter nic-28500556-eth0-vmware-sfw.2
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 5, len 804, cnt 20
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 6, len 24, cnt 0
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 13, len 24, cnt 0
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 3, len 17874, cnt 35
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 9, len 24, cnt 0
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 11, len 45, cnt 4
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 8, len 24, cnt 0
2022-01-05T17:17:05.370Z cpu39:2098345)ImportStateTLV entry type 7, len 2224, cnt 5
2022-01-05T17:17:05.370Z cpu39:2098345)Importing succeeded
Note: The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment
Sample PSOD Screenshot:![PSOD_cfagent.PNG](https://api-broadcomcms-software.wolkenservicedesk.com/attachment/get_attachment_content?uniqueFileId=1512722305230)