NAT rules stop working on NSX-T T0 or T1s.
NAT rules appear missing from the T0/T1 interface when checking via CLI
Affecting all versions up to 3.1.x
Relevant log location:
In var/log/proton/nsxapi.log, the following message indicates a NPE is generated when the StateSync Sharding leader sync NAT configuration with the other two managers:
2021-10-26T05:09:14.690Z ERROR FullSyncMsgLoader AbstractFullStateSyncDataBuilder - - [nsx@6876 comp="nsx-manager" errorCode="MP4717" level="ERROR" subcomp="manager"] Error happened when provider com.vmware.nsx.management.firewall.sync.ccp.NatSectionAndRuleSyncMessageProvider@26357244 is converting messages from id LogicalRouter/xxxxx-xxxx-xxxx-xxxx-xxxxxxx, skip it.
java.lang.NullPointerException: null
Check if there are any NAT rules with logging=null in Corfu.
In a shell on a manager, run the following command to write all NAT rules to a file
/opt/vmware/bin/corfu_tool_runner.py -r nsx-manager -t NatRule > /tmp/natrule.txt
Then, use the following command to find out any rules with logging=null
less /tmp/natrule.txt | grep "ruleId\|logging"
ruleId=17422,
logging=<null>,
ruleId=2231,
logging=<null>,
As the problem is specific to StateSync and StateSync is depreciated in 3.2, upgrade to NSX-T 3.2 or later to resolve this issue.
Workaround:
1. SSH to all NSX Managers as root
2. Use the following two commands to find out any NAT rules with logging=null
/opt/vmware/bin/corfu_tool_runner.py -r nsx-manager -t NatRule > /tmp/natrule.txt
less /tmp/natrule.txt | grep "ruleId\|logging"
3. Change the value of logging to false on all the rules listed in Step 2 output