This KB article illustrates the scenario in which log loss i.e dropped envelopes occur due to the use of outdated, unused application nozzles. Using outdated CF CLI plugins like the FirehosePlugin or having idle or unused nozzles can also contribute to increased pressure on the Loggregator VMs, and eventually, log loss / dropped envelopes.
This KB article also assumes that you have already adequately scaled the Loggregator, Log Cache, and Doppler VMs per the documentation, and are still experiencing log loss / envelope drops. The ideal ratio of Dopplers to Loggregator to Nozzles should be 2:1:1. For example, if you have 40 Dopplers, 20 Loggregators, and 20 splunk / firehose nozzles, this matches the ideal ratio. 40:20:20 = 2:1:1
Symptoms of this issue include the following:
Doppler.stderr.log (from the Doppler VM) where the ShardID indicate the nozzle where the drops are originating from. Based on the log output below, the drops are occurring in the new relic nozzles and the FirehosePlugin:2024-11-26T15:18:43.511264556Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:44.067754158Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:45.312025078Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:46.511093603Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:47.280569372Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.697958693Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.934935543Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:49.799814328Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:50.671530452Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:51.259809542Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:52.120642192Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:53.053222981Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:54.686482713Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:55.818981168Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:56.392525947Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:57.017061080Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-12-03T16:21:45.694002017Z Dropped 2000 envelopes (v2 buffer) ShardID: newrelic.firehose
There are a few options we can consider to mitigate this issue:
Doppler.stderr.log file: 2024-11-26T15:18:43.511264556Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:44.067754158Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:45.312025078Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:46.511093603Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:47.280569372Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.697958693Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.934935543Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:49.799814328Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:50.671530452Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:51.259809542Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:52.120642192Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:53.053222981Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:54.686482713Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:55.818981168Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:56.392525947Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:57.017061080Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-12-03T16:21:45.694002017Z Dropped 2000 envelopes (v2 buffer) ShardID: newrelic.firehose
In this case, we can log into Apps Manager, and search for the keywords "nozzle" or "firehose":
From here, we can turn off the unnecessary newrelic nozzle or delete them altogether.
Doppler.stderr.log file below, we can see envelopes being dropped via the FirehosePlugin: 2024-11-26T15:18:43.511264556Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:44.067754158Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:45.312025078Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:46.511093603Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:47.280569372Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.697958693Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:48.934935543Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:49.799814328Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:50.671530452Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:51.259809542Z Dropped 1000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:52.120642192Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:53.053222981Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:54.686482713Z Dropped 4000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:55.818981168Z Dropped 3000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:56.392525947Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-11-26T15:18:57.017061080Z Dropped 2000 envelopes (v1 buffer) ShardID: FirehosePlugin2024-12-03T16:21:45.694002017Z Dropped 2000 envelopes (v2 buffer) ShardID: newrelic.firehose
Given that the FirehosePlugin is unused, we can uninstall this CF CLI plugin that is contributing to the log loss / dropped envelopes via the command below:
cf uninstall-plugin FirehosePlugin
As an example, we scale down the number of Splunk Nozzles from 54 to 20 and then run an apply changes only on the Splunk Firehose nozzle tile:
If you are using something like the newrelic or firehose-to-syslog nozzle, you can scale down the number of nozzles via the CF CLI command below:
cf scale $NOZZLE_APP_NAME -i <NUMBER>
cf install-plugin -r CF-Community "log-cache"
log-meta command to identify the logging output of applications running on the foundation: cf log-meta --guid --noise --sort-by rate | tee noisy_apps.txt
noisy_apps.txt file. Note that the applications with the largest logging output i.e "noisy" apps are the ones at the bottom of the file: 8786f93e-c12d-4c21-82bb-5e3b57297e41 100000 2094164230 1m17s 106514d378d7ff-1359-4fc4-82a7-604f55080fbe 100000 1641340419 53s 120295c0e7e615-c82a-41ae-bdd5-3370444abfd6 100000 2241700486 45s 127611984c4073-5c62-47e9-9d06-71c124a06203 100000 2682335689 1m5s 129968system_metrics_agent 100000 2698345127 32s 1430446b4fcead-8631-4e3b-887a-c8aababe56db 100000 4012195180 28s 251197gorouter 100000 10336679854 14s 543042
As seen in the above output snippet of noisy_apps.txt, our noisiest app is the app that has the GUID of 6b4fcead-8631-4e3b-887a-c8aababe56db which has a log output of 251,197 logs per minute or roughly 4,186 logs per second (251,197 logs / 60 seconds).
cf curl /v3/apps | jq . | grep -B 50 -A 50 '"guid": "<APP GUID>"' | grep '"name":' --color=auto
-l parameter of the cf scale command, you must be using a version of the CF CLI that is above version 8.5. More information can be obtained about this in this KB article. As an example, we will assign a log rate limit of 100 Bytes or lines per second (which is the default per this KB article). Note that setting the log rate limit restarts the application:
cf scale <App Name> -l 100B
After implementing one or more options stated above, we would expect the result to be a decrease or elimination of the rate of dropped envelopes and decrease in the RLP Message Loss rate: