After onboarding NSX 4.2.2 on Security Services Platform (SSP) 5.1, Inventory Sync status change to down and remains down for very long duration. Status is expected to be down during initial onboarding where config and inventory data are synced from NSX to SSP but if the status remain down for more than 40 mins it may have encountered this issue where NSX Pace Agent keep sending continuous full syncs due internal error caused due to Corfu timeout.
SSP 5.1 or above onboarded to NSX 4.2.2
When an ongoing full sync is interrupted (e.g., due to Kafka connectivity failure or any event from SSP), the process being interrupted may not succeed. This can lead to existing full sync and new full sync running in parallel. The resources consumed during full sync are higher and with parallel full sync, this can lead to resource crunch causing delays and surpassing configured timeouts. SSP will attempt retry to complete full sync leading to another full sync resulting in a continuous loop of full sync attempts. Hence the status of Inventory Sync will be down due to ongoing full sync.
Symptoms:
1. Inventory Sync status remains down for an extended period.
Navigate to System → NSX Managers → Connectivity Agent (View Details)
2. Agent status API shows a TrimmedException message.
Check reason for the Inventory Sync down status by calling below API
Query the agent status API using the below curl command
curl -k --location -u '<user-name>:<password>' --request GET 'https://<nsx-ip>/policy/api/v1/infra/sites/napp/agent/status'
Search for PACE_UFOSTORE_CONNECT in the Response, message should look like below
{
"napp_agent_statuses" : [ {
"agent_type" : "AGENT_TYPE_PACE",
"agent_state" : {
"enabled" : true,
"actions" : [
......
{
"action_name" : "PACE_UFOSTORE_CONNECT",
"required_for_init" : true,
"status" : "ERROR",
"message" : "PACE_UFOSTORE_CONNECT failed due to: org.corfudb.runtime.exceptions.StreamingException: org.corfudb.runtime.exceptions.TrimmedException: Subscription Stream[nsx$tag:intelligence][85d8] :: sync start address falls behind trim mark. This will incur in data loss for data in the space [10594687, 10612878] (inclusive)",
"metadata" : {
"_create_user" : "system",
"_create_time" : 1759387837093,
"_last_modified_user" : "system",
"_last_modified_time" : 1759473127200
}
}
.....
]
}
} ]
}
A Proton restart is needed to address the problem. Restart Proton on the master NSX(UA) node for INTELLIGENCE_AGENT_SERVICE
STEP 1: SSH to NSX and Identify master NSX node
su admin -c get cluster status verbose | grep 'INTELLIGENCE_AGENT_SERVICE\|nsx'
// Sample output
INTELLIGENCE_AGENT_SERVICE 1 SMALL 3bd92c42-b485-8b5d-23a5-830feac10f3c 61977
3bd92c42-b485-8b5d-23a5-830feac10f3c 20.x.x.12 not configured UP nsx-mgr-0
03a72c42-14e6-9368-4b74-aab90017b099 20.x.x.13 not configured UP nsx-mgr-1
b5502c42-4980-fec1-0b08-82a749cf2131 20.x.x.14 not configured UP nsx-mgr-2
Here nsx-mgr-0(20.x.x.12) is the leader node for INTELLIGENCE_AGENT_SERVICE
STEP 2: SSH to leader NSX and run below command to restart proton
systemctl restart proton