Telegraf Agents shows is dropping metrics: cloud proxy showing repeated Metric buffer overflow; XXX metrics have been dropped
search cancel

Telegraf Agents shows is dropping metrics: cloud proxy showing repeated Metric buffer overflow; XXX metrics have been dropped

book

Article ID: 397060

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

After upgrade to Aria Operations 8.18HF5, Aria Operations graphs from Telegraf agents shows  gaps in data collection.

Environment

Aria Operations 8.18 HF 5 and later

Cause

The issue is caused by limitation on the buffer limit in Aria Operations. 

Currently, the metric buffer is "dynamic" in the sense that it changes based on the plugins used in the server:

 linux:
    os: 100
    mysql: 600
    oracledb: 600
    postgresql: 400

  Windows:
    os: 100
    mysql: 600
    mssql: 1000
    msexchange: 400
    msiis: 500
    oracledb: 600

 

To determine what plugins are running in the environment, the following commands can be ran on the telegraf endpoint server:

1. sudo /opt/vmware/ucp/ucp-minion/bin/ucp-minion.sh --config /opt/vmware/ucp/salt-minion/etc/salt/grains --action xtract_config --dest_dir=/tmp/telegraf-bkp

2. opt/vmware/ucp/ucp-telegraf/usr/bin/telegraf -config /tmp/telegraf-bkp/telegraf.conf --test

 

Resolution

in certain environments a specific plugin may cause the buffer limit to be trespassed, hence exceeding the dynamically set limit in the configuration.  

with output from the above commands, it was determine that  the Tegraf server was running an execution script that was generating a number of metrics that was exceeding the dynamically set buffer limit. 

To resolve the issue: 

Putty to the cloud proxy collecting the data from the Telegraf agent and edit:

ucp/ucp-config-scripts/salt/pillar/metric_buffer_limit.sls

Currentt metric_buffer_limit :
  default: 300
  max_size: 4000

change the "default" value to : 10000 (10K)
change the "max_size" value to: 10000 (10K)

 

Additional Information

modifying the metric_buffer_limit.sls file unblocks the configuration such that more agents can be deployed with the updated buffer limit configuration , but is not a permanent solution.

Updates to the metric_bufer_limit.sl, will not persist an upgrade of Aria Operations.