Understanding QUEUE_LIMIT_SECONDS and QUEUE_LIMIT

Products

VMware Smart Assurance

Issue/Introduction

How do the settings QUEUE_LIMIT_SECONDS and QUEUE_LIMIT_MEGS work in the Smarts SAM Trap Adapter (TA)/Exploder (TE) configuration file (trapd.conf)?

Environment

Smarts - 10.1.x

Resolution

NOTE: Settings explained in this article apply only to Smarts SAM trap adapter/exploder configuration. The trapd.conf parameters do not apply to the IP Trap Receiver Process. Hence the IP domain can be subject to trap floods if the Trap Exploder or other process is sending an excessive number of traps to the IP domain.

QUEUE_LIMIT_SECONDS:

# Limits the time that a trap can spend in the internal
# trap queue. This limit is even less exact than the
# size limit below. In general, it's advisable to
# specify both. When the limit is reached, some traps
# will be discarded. The default value is 0, which
# means no limit.

The trap handling code tracks how many traps a second being removed from the queue and divides the current queue size by the traps/second to estimate how long it would take to drain the queue. Then it compares that value to QUEUE_LIMIT_SECONDS. If the value is over that limit, then it will identify trap sources to discard based on the number of traps they are generating. The reason is that it has often been found in trap flood cases that one or a couple of devices are generating enormous numbers of traps. So, rather than discarding traps from all devices, it is better to discard the traps from the devices generating lots of traps and let the others through.

It is possible that a well behaved device that only sends out a trap once in a while will have that trap processed beyond the QUEUE_LIMIT_SECONDS time if a flood of traps from another device is causing it to discard traps. The trap, if it is not selected to be discarded, should make it through the queue at or around the time of QUEUE_LIMIT_SECONDS after it arrives in the queue. This is assuming the rate of traps remains the same and the drainage/drop rate remains the same. This also depends on the overall server workload. Overall, the QUEUE_LIMIT_SECONDS setting is only a rough boundary of the maximum queuing time, but it is unlikely that a trap will stay in the queue significantly longer than the QUEUE_LIMIT_SECONDS setting. It should only be a few seconds longer at most.

QUEUE_LIMIT_MEGS

# Limits the size of internal trap queue to the
# stated size. The limit is not exact - the queue
# may grow slightly larger than that. When the limit
# is reached, some traps will be discarded. The
# default value is 0, which means no limit.

If it is over this limit, then the code will identify trap sources to discard based on the number of traps they are generating. Sometimes in trap flood cases, certain devices generate significantly large numbers of traps compared to others. Rather than discarding traps from all devices, it is better to discard the traps from the "chatty"/"noisy" devices while still letting the others through.

If the server platform does not have enough processing power (for example, if the processing rate, traps/sec, falls below 100), then the QUEUE_LIMIT_SECONDS algorithm will not be applied and effectively QUEUE_LIMIT_SECONDS defaults to 0. Therefore, if QUEUE_LIMIT_MEGS is not used, then the queue could eventually grow to an infinite size and much longer queuing delays may be seen. The algorithm for QUEUE_LIMIT_MEGS is always applied regardless.

It is recommended that both QUEUE_LIMIT_MEGS and QUEUE_LIMIT_SECONDS be used together, as this will work best in most situations.