Duplicate Incident IDs seen when Fetching DLP Cloud Incidents via API
search cancel

Duplicate Incident IDs seen when Fetching DLP Cloud Incidents via API

book

Article ID: 421644

calendar_today

Updated On:

Products

Data Loss Prevention Cloud Package CASB Gateway

Issue/Introduction

When integrating with the DLP Cloud API to fetch incidents, users may observe duplicate incident IDs appearing across different pages of results. This typically occurs when the integration script uses a manual "Greater Than" filter (e.g., incidentId >= X) to retrieve new records in batches.

Environment

DLP Cloud

Symantec CloudSOC

Cause

The issue is caused by the behavior of the underlying search engine when using manual filtering for pagination on a dynamic dataset.

If new incidents are generated or the index updates while an API fetch operation is in progress, the sort order of the data may shift. Consequently, records that were already successfully fetched on a previous page may "slide" into the retrieval window of the next page, resulting in the same incident ID being returned twice.

Resolution

To eliminate duplicates and ensure a consistent data stream, the integration logic must switch to the searchAfterValue pagination method. This method uses a system-generated cursor to accurately track the position in the index, regardless of incoming new data.

Implementation requires a two-step workflow:

Step 1: The Initial "Bootstrap" Query

The first request establishes the starting point. You must sort by detectionDate (ascending). The system will implicitly use the incident ID as a secondary tie-breaker.

Instructions:

  • Identify your starting ID: In the operandTwoValues field, enter the ID of the last incident you successfully processed.

  • Example Scenario: In the query below, we are fetching incidents starting from ID 1000.

Sample Request Body (Initial Call):

{
  "select": [
    { "name": "incidentId" },
    { "name": "detectionDate" }
  ],
  "filter": {
    "filterType": "long",
    "operandOne": {"name": "incidentId"},
    "operator": "gte",
    "operandTwoValues": [1000]   <-- ENTER STARTING INCIDENT ID HERE
  },
  "orderBy": [
    {
      "field": {"name": "detectionDate"},
      "order": "ASC"
    }
  ],
  "limit": 10
}

Step 2: The Pagination Loop

Upon sending the request above, the API returns a batch of 10 incidents (e.g., IDs 1000 through 1009). You must now look at the API Response to find the cursor for the next batch.

1. Locate the Cursor in the Response:

Look for the searchAfterValue field in the response metadata. It contains the timestamp and ID of the last incident in that batch (e.g., ID 1009).

Response Example:

{
  "meta": {
    "totalCount": 5000,
    "searchAfterValue": [1646424416000, 1009] <-- Cursor for the last incident (ID 1009)
  },
  "incidents": [ ... ]
}

2. Construct the Next Query:

For the next request, remove the manual incidentId filter. Instead, use the searchAfterValue you just received (from ID 1009) to tell the system exactly where to resume.

  • Action: Copy the [1646424416000, 1009] array into the page object's value field.

  • Note: The limit parameter is replaced by pageSize inside the page object.

Sample Request Body (Loop Iteration):

{
  "select": [
    { "name": "incidentId" },
    { "name": "detectionDate" }
  ],
  "filter": {
    "filterType": "booleanLogic",
    "booleanOperator": "and",
    "filters": []
  },
  "page": {
    "type": "search_after",
    "pageSize": 10,
    "value": [1646424416000, 1009]  <-- VALUE FROM STEP 1 RESPONSE (ID 1009)
  },
  "orderBy": [
    {
      "field": {"name": "detectionDate"},
      "order": "ASC"
    }
  ]
}

Looping: For the third request, you will use the searchAfterValue from the second response (e.g., ID 1019), and so on, until no further incidents are returned.

Additional Information

About the DLP Incidents Query