Lastline NDR - Delayed Analysis and Growing Retry Queues Due to inserted_pcap Worker Failures
search cancel

Lastline NDR - Delayed Analysis and Growing Retry Queues Due to inserted_pcap Worker Failures

book

Article ID: 405147

calendar_today

Updated On:

Products

VMware vDefend Network Detection and Response

Issue/Introduction

Several customers have reported delayed analysis and high message volumes in RabbitMQ retry queues on their on-premise NSX Lastline installations. Affected environments exhibited symptoms such as:

  • Sluggish analysis pipelines

  • Eventual SIEM alert delays

  • Accumulating queue sizes, particularly within retry.N queues

Example queue status from rabbitmqctl: (When you run the queue command to check what queues are filling up,  cmd: rabbitmqctl -p llq_v1 list_queues | grep -v -w 0$, in the output, you'll see retry queues as below):

retry.0 334
retry.1 775
retry.2 419
retry.3 1976
retry.4 1927
retry.5 5539
 
This indicated repeated failed processing attempts for certain message types.

Cause

Upon further investigation, it was discovered that the messages piling up in retry queues—especially retry.5—had the routing key: inserted_pcaps.c2736

These messages are handled by the queue_worker_inserted-pcaps, which relies on the pcapapi component to interface with Suricata for analysis. The processing failures were traced back to the following:

  • The worker was receiving 500 INTERNAL SERVER ERROR responses from the pcapapi2 service.

  • These errors originated from a Python exception in suricata.py: AttributeError: 'AlprotoDns' object has no attribute 'rdata'

  • The error stems from a mismatch between the pcapapi plugin logic and newer versions of Suricata-EVE JSON structures.

  • As a result, the message processing fails, enters an exponential backoff retry loop, and eventually overflows the retry queues.

Additional observations:

  • No significant logs were found in /var/log/suricata or through journalctl.

  • Suricata in this case was running directly on the manager node (not in a container).

Resolution

If you encounter this issue, kindly contact Broadcom support.

Additional Information

Additional Context: Retry Queues in RabbitMQ

  • The retry.N queues are part of an exponential backoff system for transient errors.

  • They store failed messages temporarily before retrying with increasing delay.

  • Messages are routed back to their original queue after TTL expiry for reprocessing.

  • Once max retries are exceeded, messages may be silently dropped if no binding exists for the current retry_count.