Event Drops in VMware Aria Operations for Logs due to Pending Queue Overload

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

VMware Aria Operations for Logs is experiencing significant event drops when forwarding logs. The issue stems from the cluster's pending queue becoming overloaded, causing it to drop incoming events.

Reviewed the logs from the VMware Aria Operations for Logs and identified multiple instances where events were dropped due to the cluster's pending queue being full. Below are relevant log snippets that highlight the issue:

[2024-07-31 02:18:20.016+0000] ["PersistentNotification-thread-42"/#.#.#.# INFO] [com.vmware.loginsight.daemon.notifications.PersistentNotificationQueue] [Sending notification '

{"AlertId":"Event Forwarder Events Dropped","Name":"Event Forwarder Events Dropped","Description":"VMware Aria Operations for Logs just dropped 203 events for forwarder target 'example.example.com', reason: Pending queue is full..\n\nThis message was generated by your VMware Aria Operations for Logs installation, visit the <a href='https://www.vmware.com/support/pubs/log-insight-pubs.html'>Documentation Center</a> for more information.","TriggerTime":"2024-07-31T02:18:19.659Z"}
', using notification provider 'com.vmware.loginsight.notifications.JsonLogNotificationProvider' attempt #1]

[2024-07-31 02:18:20.450+0000] ["ImportingThread-4"/#.#.#.# WARN] [com.vmware.loginsight.ingestion.forwarding.BaseForwarder] [Dropped 584 events for target example.example.com, reason: Pending queue is full. [5221 suppressed]]
[2024-07-31 02:18:50.452+0000] ["ImportingThread-3"/#.#.#.# WARN] [com.vmware.loginsight.ingestion.forwarding.BaseForwarder] [Dropped 72 events for target example.example.com, reason: Pending queue is full. [9154 suppressed]]
[2024-07-31 04:18:30.265+0000] ["ImportingThread-4"/#.#.#.# WARN] [com.vmware.loginsight.ingestion.forwarding.BaseForwarder] [Dropped 510 events for target example.example.com, reason: Pending queue is full. [7380 suppressed]]

In addition to above logs, in the /storage/var/loginsightcassandra.log on the Aria Operations node(s), some related log entries may be seen:

[2024-06-09 10:57:38.466+0000] ["LogImporterService-thread-9115"/#.#.#.# WARN] [com.vmware.loginsight.ingestion.importer.MultiThreadImporterService] [Unable to import messages due to disk queue full, with 100 unparsed messages for parser syslog-live [24 suppressed]]

2025-07-24 00:26:18.024+0000] ["syslog-message-processing-thread"/#.#.#.# WARN] [com.vmware.loginsight.ingestion.routing.QueuedRemoteNodeMessageForwarder] [Unable to accept messages, queue is full for node: IngestionNode [hostName=#.#.#.#, daemonPort=16520, ingestionPort=16575, nodeToken=#####################] [7626 suppressed]]

You may also see the alert:

Dropped Events (Host = <host_name>) triggered at 2025-07-23T22:35:05.444Z
This notification was generated from Operations for Logs node (Host = <host_name>, Node Identifier = ########-####-####-####-############).
510,816,187 events messages have been dropped on node <host_name> since the last alert at 22:35:05 UTC Jul 22 2025.
4,754,984,786 events messages have been dropped on node <host_name> since the last time VMware Aria Operations for Logs was started.

Environment

VMware Aria Operations for Logs 8.x

Cause

The logs indicate that the "Pending queue is full" or "queue is full for node" message is causing event drops. This suggests the cluster is unable to process incoming events at the expected rate, leading to a backlog in the pending queue and dropped events.

Resolution

There are three potential solutions to address the pending queue overload:

Increase Worker Threads: The number of simultaneous outgoing connections to use. Normally higher worker count is needed for higher network latency to the forwarded destination and for higher events per second being forwarded. It can be increased (up to 512). How to increase the worker count (see step 4)
Implement Forwarder Filtering (Experimental): Filtering can be implemented on the log forwarder in the source cluster. This would discard unwanted events before sending them to the destination, reducing the overall load. However, be aware that filtered events will be lost.
Scale out the Cluster: Adding another node to the destination cluster increases overall processing capacity and reduces the load on the individual nodes. Adding a node to the cluster
Refining Forwarding Rules: Distributing the event load across multiple Log Forwarding rules reduces the processing burden on each individual rule, minimizing the risk of dropped events.

Additional Information

The provided log snippets confirm dropped events due to a full pending queue.
Consider latency as a potential factor contributing to the overload.

保留中キューの過負荷による VMware Aria Operations for Logs でのイベントのドロップ