Inventory Sync fails to recover on SSP - PACE_UFOSTORE_CONNECT or PACE_AGENT

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

After onboarding NSX 4.2.2 on Security Services Platform (SSP) 5.1, Inventory Sync status change to down and remains down for very long duration. Status is expected to be down during initial onboarding where config and inventory data are synced from NSX to SSP but if the status remain down for more than 40 mins it may have encountered this issue where NSX Pace Agent keep sending continuous full syncs due internal error caused due to Corfu timeout.

Environment

SSP 5.1 or above onboarded to NSX 4.2.2

Cause

When an ongoing full sync is interrupted (e.g., due to Kafka connectivity failure or any event from SSP), the process being interrupted may not succeed. This can lead to existing full sync and new full sync running in parallel. The resources consumed during full sync are higher and with parallel full sync, this can lead to resource crunch causing delays and surpassing configured timeouts. SSP will attempt retry to complete full sync leading to another full sync resulting in a continuous loop of full sync attempts. Hence the status of Inventory Sync will be down due to ongoing full sync.

Symptoms:

1. Inventory Sync status remains down for an extended period.
Navigate to System → NSX Managers → Connectivity Agent (View Details)

2. Agent status API shows a TrimmedException message.

Check reason for the Inventory Sync down status by calling below API

Query the agent status API using the below curl command


curl -k --location -u '<user-name>:<password>' --request GET 'https://<nsx-ip>/policy/api/v1/infra/sites/napp/agent/status'

You will find error related to PACE_UFOSTORE_CONNECT or PACE_AGENT_INIT in the Response, messages should look like below

{
  "napp_agent_statuses" : [ {
    "agent_type" : "AGENT_TYPE_PACE",
    "agent_state" : {
      "enabled" : true,
      "actions" : [ 

      ......

      {
        "action_name" : "PACE_UFOSTORE_CONNECT",
        "required_for_init" : true,
        "status" : "ERROR",
        "message" : "PACE_UFOSTORE_CONNECT failed due to: org.corfudb.runtime.exceptions.StreamingException: org.corfudb.runtime.exceptions.TrimmedException: Subscription Stream[nsx$tag:intelligence][85d8] :: sync start address falls behind trim mark. This will incur in data loss for data in the space [10594687, 10612878] (inclusive)",
        "metadata" : {
          "_create_user" : "system",
          "_create_time" : 1759387837093,
          "_last_modified_user" : "system",
          "_last_modified_time" : 1759473127200
        }
      
      }

      .....

      ]
    }
  } ]
}

Or

{
  "napp_agent_statuses" : [ {
    "agent_type" : "AGENT_TYPE_PACE",
    "agent_state" : {
      "enabled" : true,
      "actions" : [ 

      .....

{
      "action_name" : "PACE_AGENT_INIT",
        "required_for_init" : true,
        "status" : "ERROR",
      "message" : "PACE_AGENT_INIT failed",
        "metadata" : {
          "_create_user" : "system",
        "_create_time" : 1760108381716,
          "_last_modified_user" : "system",
        "_last_modified_time" : 1760108381800
        }

   }

      .....

      ]
    }
  } ]
}

Resolution

A Proton restart is needed to address the problem. Restart Proton on the master NSX(UA) node for INTELLIGENCE_AGENT_SERVICE

STEP 1: SSH to NSX and Identify master NSX node


su admin -c get cluster status verbose | grep 'INTELLIGENCE_AGENT_SERVICE\|nsx'

// Sample output
    INTELLIGENCE_AGENT_SERVICE                                     1                    SMALL                3bd92c42-b485-8b5d-23a5-830feac10f3c       61977                           
    3bd92c42-b485-8b5d-23a5-830feac10f3c       20.x.x.12       not configured                                   UP               nsx-mgr-0                                 
    03a72c42-14e6-9368-4b74-aab90017b099       20.x.x.13       not configured                                   UP               nsx-mgr-1                                 
    b5502c42-4980-fec1-0b08-82a749cf2131       20.x.x.14       not configured                                   UP               nsx-mgr-2    




Here nsx-mgr-0(20.x.x.12) is the leader node for INTELLIGENCE_AGENT_SERVICE

STEP 2: SSH to leader NSX and run below command to restart proton


systemctl restart proton