Inventory Sync fails to recover on SSP - PACE_UFOSTORE_CONNECT failed
search cancel

Inventory Sync fails to recover on SSP - PACE_UFOSTORE_CONNECT failed

book

Article ID: 412043

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

After onboarding NSX 4.2.2 on Security Services Platform (SSP) 5.1, Inventory Sync status change to down and remains down for very long duration. Status is expected to be down during initial onboarding where config and inventory data are synced from NSX to SSP but if the status remain down for more than 40 mins it may have encountered this issue where NSX Pace Agent keep sending continuous full syncs due internal error caused due to Corfu timeout. 

 

Environment

SSP 5.1 or above onboarded to NSX 4.2.2

Cause

When an ongoing full sync is interrupted (e.g., due to Kafka connectivity failure or any event from SSP), the process being interrupted may not succeed. This can lead to existing full sync and new full sync running in parallel. The resources consumed during full sync are higher and with parallel full sync, this can lead to resource crunch causing delays and surpassing configured timeouts. SSP will attempt retry to complete full sync leading to another full sync resulting in a continuous loop of full sync attempts. Hence the status of Inventory Sync will be down due to ongoing full sync. 

 

Symptoms:

1. Inventory Sync status remains down for an extended period.
Navigate to System → NSX Managers → Connectivity Agent (View Details)


2. Agent status API shows a TrimmedException message.

Check reason for the Inventory Sync down status by calling below API


Query the agent status API using the below curl command 


curl -k --location -u '<user-name>:<password>' --request GET 'https://<nsx-ip>/policy/api/v1/infra/sites/napp/agent/status'

 

 Search for PACE_UFOSTORE_CONNECT in the Response, message should look like below

{
  "napp_agent_statuses" : [ {
    "agent_type" : "AGENT_TYPE_PACE",
    "agent_state" : {
      "enabled" : true,
      "actions" : [ 

      ......

      {
        "action_name" : "PACE_UFOSTORE_CONNECT",
        "required_for_init" : true,
        "status" : "ERROR",
        "message" : "PACE_UFOSTORE_CONNECT failed due to: org.corfudb.runtime.exceptions.StreamingException: org.corfudb.runtime.exceptions.TrimmedException: Subscription Stream[nsx$tag:intelligence][85d8] :: sync start address falls behind trim mark. This will incur in data loss for data in the space [10594687, 10612878] (inclusive)",
        "metadata" : {
          "_create_user" : "system",
          "_create_time" : 1759387837093,
          "_last_modified_user" : "system",
          "_last_modified_time" : 1759473127200
        }
      }

      .....

      ]
    }
  } ]
}

Resolution

A Proton restart is needed to address the problem. Restart Proton on the master NSX(UA) node for INTELLIGENCE_AGENT_SERVICE

 

STEP 1: SSH to NSX and Identify master NSX node 


su admin -c get cluster status verbose | grep 'INTELLIGENCE_AGENT_SERVICE\|nsx'

// Sample output
    INTELLIGENCE_AGENT_SERVICE                                     1                    SMALL                3bd92c42-b485-8b5d-23a5-830feac10f3c       61977                           
  3bd92c42-b485-8b5d-23a5-830feac10f3c       20.x.x.12       not configured                                   UP               nsx-mgr-0                                
  03a72c42-14e6-9368-4b74-aab90017b099       20.x.x.13       not configured                                   UP               nsx-mgr-1                                
  b5502c42-4980-fec1-0b08-82a749cf2131       20.x.x.14       not configured                                   UP               nsx-mgr-2  




Here nsx-mgr-0(20.x.x.12) is the leader node for INTELLIGENCE_AGENT_SERVICE


STEP 2: SSH to leader NSX and  run below command to restart proton


systemctl restart proton