Experiencing log loss in their Splunk Tile installation. Doppler.log show consistent "Dropped (egress) X000 envelopes" error.
Product Version: 2.4
The error "Dropped (egress) X000 envelopes" indicates that the configured consumer(s) are not consuming logs at the same rate that Loggregator delivers. This condition can lead to back pressure on the doppler components and eventual dropped envelopes.
Example consumers are 3rd Party nozzles from Splunk or Newrelic, Ingestor applications used by Healthwatch and or Metrics, troubleshooting tools such as cf top and cf logs to stream logs. Generally, the rule of thumb is to scale the number of consumers to match the number of dopplers. The following steps can be used to help determine which consumer is leading to dropped envelopes on the consumer side, If the recommended scaling for the Loggregator component have been done but the problem persist. https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-platform-for-cloud-foundry/6-0/tpcf/log-ops-guide.html
sudo as rootvar/vcap/sys/log/doppler$ tail -f doppler.logDropped (egress) X000 envelopes"
1. Determine the number of subscriptions to the firehose. The value reported should be consistent with the total number of configured consumers. curl -H "Authorization: $(cf oauth-token)" -G "https://log-cache." --data-urlencode 'query=subscriptions{source_id="doppler"}'
2. Run the following commands to get Ingress and egress statistics. average ingress per doppler per second (as an admin):
curl -G -H "Authorization: $(cf oauth-token)" "https://log-cache./api/v1/query" --data-urlencode 'query=avg(rate(ingress{source_id="doppler"}[5m]))' total ingress of all dopplers per second (as an admin): curl -G -H "Authorization: $(cf oauth-token)" "https://log-cache./api/v1/query" --data-urlencode 'query=sum(rate(ingress{source_id="doppler"}[5m]))' average dropped per doppler per second (as an admin): curl -G -H "Authorization: $(cf oauth-token)" "https://log-cache./api/v1/query" --data-urlencode 'query=avg(rate(dropped{source_id="doppler"}[5m]))' dropped across all dopplers per second (as an admin): curl -G -H "Authorization: $(cf oauth-token)" "https://log-cache./api/v1/query" --data-urlencode 'query=sum(rate(dropped{source_id="doppler"}[5m]))'
3. The expected output is:
"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1574128855.000,"0"]}]}}[1574128855.000,"0"] demonstrates the epoch time representation of the call and "0" is the result.