App log rate limiting can be very useful in throttling very chatty or busy apps. Apps which generate excessive logs can adversely impact the overall logging system. When too many logs come from an app, it can saturate Loggregator bandwidth and cause drops metrics for Healthwatch and other log consumers.
VMware recommended to use at minimum the default limit of 100 to avoid feedback latency. However, each customer will have to decide what is the minimum number of log entries per second that is adequate to meet their needs for debugging, trending analysis, etc.
1. When it is enabled, what is the cut-off threshold per an app instance to be marked noisy and put on hold?
See step 14 in Configure App Containers. At minimum, VMware recommends using the default limit of 100.
2. How long is the hold? When will log injection into log stream be resumed?
The app logs are not put on hold, they are buffered and throttled to the maximum number of lines per second you have configured.
3. If we set the limit to 100, is it correct to say "when the hold is released, app instance logs will be pushed at the rate of 100 lines/sec"? When will buffering stop?
It depends on the app. If there is an extremely noisy app instance whose logs are buffered and get released at 100 L/S. It may take up to say 2 minutes to clear the buffer.
4. Where exactly is the buffer located?
The buffering occurs on the Diego Cells where the app is running.
5. How will those VMs be affected by this buffering? More CPU/DISK/Memory?
Since there is some processing required to implement the buffering and throttling of the logs, and some memory is used as the buffer, there is some additional load placed on both CPU and memory.
6. Does it affect cf APP logs --recent as well?
The output of cf logs APP --recent would only be affected to the extent that recent logs may be still in the buffer and not yet released to the Loggregator system.
7. Is app performance impacted by the buffering?
The numerical limit is the number of log lines per second that are retained for each application instance.
a. Logs over the limit will be discarded and not appear in the output of cf logs.
b. It seems the original intent of this feature was to protect both the logging system and the components on the Diego Cells. For more information, refer to **Explore** The effects of applying a rate limit to app instance logging at various points in the system to inform solution decisions for the "Noisy Neighbor" problem.
8. Does the log app rate limiting reduce system component resource usage?
We have tested app log rate limiting with a very chatty app. Before we implemented rate limiting, we saw the rep process consuming ~90% CPU, loggregator-agent consuming ~51%, forwarder-agent consuming ~60%, doppler consuming ~85%.
After we added rate limiting, we saw the rep consuming <1%, loggregator-agent consuming <1%, forwarder-agent consuming <1%, doppler consuming ~6%.