DFW rules are missing in host transport nodes.
*Relevant log’s location*:
In CCP log, it shows that DFW rules in "RULE_SECTION" are deleted during a Full Sync.
In directory nsx_manager/var/log/cloudnet
$ grep "Full sync entity update for RULE*" nsx-ccp-20210727*
nsx-ccp-20210727-010852714.log:2021-07-26T19:03:51.310Z INFO MessageProcessor FullSyncTransactionCreator - - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="adapter-mp"] Full sync entity update for RULE_SECTION: Added 0, Deleted 0, Updated 7374
nsx-ccp-20210727-010852714.log:2021-07-26T19:03:51.480Z INFO MessageProcessor FullSyncTransactionCreator - - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="adapter-mp"] Full sync entity update for RULE: Added 879, Deleted 1378, Updated 56935
Note that these messages are generic and indicate that DFW rules are deleted. The above messages only show a problem if no DFW rule is deleted in NSX-T Manager.
The following messages are related to the race conditions causing stale configuration in in-memory cache.
1. In Proton log, it shows that a Full Sync started at 2021-07-29T10:55:19:
nsx_manager/var/log/proton
2021-07-29T10:55:19.237Z INFO FullSyncMsgLoader ProfileFullSyncMessageProvider - UPM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Converting Profile [type = SystemHealthPluginProfile, id = UpmProfile/c######f-0##5-4##a-b##9-f##########9] to protobuf
2021-07-29T10:55:19.237Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message UpmProfile/UpmProfile/3######0-d##9-4##4-8##0-7##########3
2021-07-29T10:55:19.237Z INFO FullSyncMsgLoader ProfileFullSyncMessageProvider - UPM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Converting Profile [type = SystemHealthPluginProfile, id = UpmProfile/3######0-d##9-4##4-8##0-7##########3] to protobuf
2021-07-29T10:55:19.237Z INFO FullSyncMsgLoader ProfileFullSyncMessageProvider - UPM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Converting Profile [type = SystemHealthPluginProfile, id = UpmProfile/a######b-0##d-4##e-a##8-f##########8] to protobuf
2. In Proton log, it shows that there was a new Sync Leader at 2021-07-29T10:55:58
/nsx_manager/var/log/proton
2021-07-29T10:55:58.850Z INFO pool-410-thread-1 WaitingForLeadershipLostAckState - - [nsx@6876 comp="gmle-client" level="INFO" subcomp="WaitingForLeadershipLostAckState"] Entering WaitingForLeadershipLostAckState for lease id <UUID> service TraceflowCacheManager on member <UUID> of group <UUID> from AcquiredState : LeadershipLease{serviceName=TraceflowCacheManager, leaderId=<UUID>, leaseVersion=6560, revocationCount=0, leaseId=<UUID>}
3. In Proton log, it shows that Statesync tried to stop loading messages to in-memory cache
proton$ grep 2021-07-29T10:55:59.845 nsxapi.*
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor StateSyncShardingCoordinator - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting shutdown for leave() processing
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor CorfuDbTransactionLogHandler - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Stopping Corfu Transaction Log Processor
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor AbstractFullSyncCommunicator - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] FullSyncService ccp full sync communicator bean is stopping.
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Stopping FullStateSyncDataBuilder
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Clearing cache
nsxapi.1.log:2021-07-29T10:55:59.845Z INFO StateSyncShardingCoordinator-leadership-executor FullStateSyncCache - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Full state sync cache stats.
4. In Proton log, it shows that Fullsync still loaded messages for 5 more seconds. Then, a fatal error occurred at 2021-07-29T10:56:04. lease note that fatal error does not always occur.
/nsx_managervar/log/proton
proton$ grep AbstractFullStateSyncDataBuilder nsxapi.1.log | grep 021-07-29T10:55:59 | head -5
2021-07-29T10:55:59.015Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:55:59.046Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:55:59.071Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:55:59.094Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:55:59.114Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
hdixon@gss-prd-csp-1:proton$ grep AbstractFullStateSyncDataBuilder nsxapi.1.log | tail -10
2021-07-29T10:56:04.019Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:56:04.043Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:56:04.062Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading message FirewallSection/FirewallSection/<UUID>
2021-07-29T10:56:04.067Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] closeTransaction
2021-07-29T10:56:04.067Z FATAL FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" errorCode="MP4705" level="FATAL" subcomp="manager"] Unexpected error in full state sync processor
at com.vmware.nsx.management.statesync.full.AbstractFullStateSyncDataBuilder.loadMessage(AbstractFullStateSyncDataBuilder.java:382) ~[ccp-1.0.jar:?]
at com.vmware.nsx.management.statesync.full.AbstractFullStateSyncDataBuilder.loadSpecial(AbstractFullStateSyncDataBuilder.java:363) ~[ccp-1.0.jar:?]
at com.vmware.nsx.management.statesync.full.AbstractFullStateSyncDataBuilder.loadData(AbstractFullStateSyncDataBuilder.java:275) ~[ccp-1.0.jar:?]
at com.vmware.nsx.management.statesync.full.AbstractFullStateSyncDataBuilder.run(AbstractFullStateSyncDataBuilder.java:217) ~[ccp-1.0.jar:?]
2021-07-29T10:56:04.067Z INFO FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Exiting MessageLoader
VMware NSX-T Data Center 3.x