NSX-T objects appear to be missing after proton or policy service restarts
book
Article ID: 316649
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
NSX-T version is 2.4.x
NSX Manager logs (nsxapi.log, policy.log, syslog) display message(s) showing NSX Manager or Policy service restarted (either manually or automatically), similar to:
NSX Manager service automatic restart (nsxapi.log): 2019-11-06T09:34:44.749Z INFO CorfuDB-DCN-Publisher-0 ContainerConfigServiceImpl - - [nsx@6876 comp="nsx-manager" subcomp="manager"] Restart application now.
Policy service manual restart (syslog): <182>1 2020-01-07T10:38:03.664Z nsx-mngr-01 NSX 9481 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="root" level="INFO"] CMD: restart service policy
NSX Manager logs (nsxapi.log or policy.log) display message(s) showing trimmed exception occurred while NSX Manager service is restarting, similar to:
2019-11-06T09:36:58.379Z WARN pool-2-thread-1 FastObjectLoader - applyForEachAddress[171952709, start=171952684] address is trimmed 2019-11-06T09:36:58.387Z WARN pool-2-thread-1 FastObjectLoader - applyForEachAddress[171952709, start=171957863] address is trimmed
NSX-T objects (NSGroup, Firewall sections and rules, Logical Switches, Logical Ports etc.) appear to be missing on the NSX Manager UI and API results:
GET /api/v1/ns-groups GET /api/v1/firewall/sections/summary GET /api/v1/logical-switches/status GET /api/v1/logical-ports
NOTE: The objects missing dictate the impact this symptom carries. Network traffic can be impacted if objects that traffic depends on are missing.
Environment
VMware NSX-T
Cause
The issue is due to concurrent operations when the NSX Manager or Policy service restarts and while the restart is in progress Corfu performs routing trim and checkpoint operations. When this happens, a Corfu trimmed exception occurs and the affected NSX Manager get presented with empty Corfu tables and the NSX-T objects appear missing.
Resolution
This issue is resolved in NSX-T 2.5.0.
Workaround: To workaround the issue, restart the NSX Manager or Policy service on the impacted NSX Manager(s).
1. Identify the impacted NSX Manager(s): #grep "address is trimmed" /var/log/proton/nsxapi.log #grep "address is trimmed" /var/log/policy/policy.log
Example of output: 2019-11-06T09:36:58.379Z WARN pool-2-thread-1 FastObjectLoader - applyForEachAddress[171952709, start=171952684] address is trimmed 2019-11-06T09:36:58.387Z WARN pool-2-thread-1 FastObjectLoader - applyForEachAddress[171952709, start=171957863] address is trimmed
2. Based on the result of step 1., restart the relevant NSX Manager service: - If the "address is trimmed" error is found in nsxapi.log, restart the NSX Manager service: #> restart service manager - If the "address is trimmed" error is found in policy.log, restart the NSX Manager policy service: #> restart service policy
3. Verify the object counts are as expected using REST API: GET /api/v1/ns-groups GET /api/v1/firewall/sections/summary GET /api/v1/logical-switches/status GET /api/v1/logical-ports
Note: the Corfu trimmed exception may occur after a manual restart of the NSX Manager or Policy services. Restart the service again if this happens.