max_per_source
.To diagnose whether Autoscaler will make proper scaling decisions you can compare the log cache metric duration to the metric collection interval.
cf log-meta
and mark down the value under "Cache Duration" for your app.To confirm, you can see the exact request rate that Autoscaler has calculated by running cf autoscaling-events APP_NAME
.
For each request made to an application on the platform a number of logging and metric envelopes are generated. Log Cache stores log and metric envelopes by source-id. For an application, the source-id is the application GUID.
As of TAS for VMs 2.9, for each request to an application, the source-id will have the following envelopes recorded:
HttpStartStop
metric - Generated by the Go RouterHttpStartStop
metric - Generated by the Diego CellLog Cache supports the configuration of a maximum number of envelopes to retain in RAM per source-id: the maximum number is the same for all source-ids.
This value is configurable in TAS on the Advanced Features form, under the description Maximum number of envelopes stored in Log Cache per source (referred to here as max_per_source
). It defaults to to 100,000 envelopes.
Once Log Cache has reached the envelope limit for a given source id, each new envelope will result in the oldest envelope in the cache for that source id being removed. Autoscaler may under count HTTP requests when it reads from a starting timestamp and then log cache removes envelopes after that timestamp.
For Autoscaler to function correctly it is important that the max_per_source
be sufficiently large to allow a contiguous stream of envelopes to be retrieved. This has implications on how you scale your Doppler instance group (see Additional Information section below).
The Autoscaler has two configurable properties (configurable via the TAS App Autoscaler form) that are relevant when calculating the appropriate size for the cache:
metric_collection_interval
- The size of the window of metrics that App Autoscaler uses to make scaling decisions (in seconds). This defaults to 120 seconds.scaling_interval
- How frequently App Autoscaler evaluates an app for scaling. This defaults to 35 seconds.Other variables:
application_requests_per_second
- The number of HTTP requests to the application per secondlog_cache_request_time
- The time taken for Autoscaler to request a single page from Log Cachemax_per_source
- The maximum number of envelopes Log Cache will hold for an individual applicationmetrics_per_request
- The number of envelopes generated for each HTTP request. Some envelopes are generated by the platform and others are generated for each application log line.The Autoscaler starts reading at the more recent of either the metric collection interval seconds ago or the oldest envelope in the cache.
Log Cache should be configured with a value for max_per_source
that allows the age of the oldest timestamp to be greater than the metric collection interval so that all the metrics can be fetched before they are removed.
The following expression must be true for Autoscaler to accurately calculate the HTTP throughput.
ok = # removed up to position (max_per_source - (application_requests_per_second * metrics_per_request * (scaling_interval - 5))) > # position autoscaler reads up to (min(application_requests_per_second * metrics_per_request * metric_collection_interval, max_per_source) - ((scaling_interval - 5)/log_cache_request_time) * 1000 (rounded down to the nearest thousand))
This formula is a bit complicated, so to make evaluation easier you can use this prepared calculator . The following is also an example to walk through the calculation.
Given the following values:
log_cache_request_time = 0.4329 max_per_source = 850,000 metric_collection_interval = 120 metrics_per_request = 3 application_requests_per_second = 2000 scaling_interval = 35
Then
ok = # removed up to position 850,000 - (2000 * 3 * (35 - 5)) > # position autoscaler reads up to (min(2000 * 3 * 120, 850,000) - ((35 - 5)/0.4329) * 1000 (rounded down to the nearest thousand)) ok = # removed up to position 850,000 - 180,000 > # position autoscaler reads up to (min(720,000, 850,000) - ((35 - 5)/0.4329) * 1000 (rounded down to the nearest thousand)) ok = # removed up to position 850,000 - 180,000 > # position autoscaler reads up to 720,000 - 69,000 ok = # removed up to position 670,000 > # position autoscaler reads up to 651,000 ok = true
Note that this expression assumes metrics_per_request is constant. An application that encounters bursts of log lines (perhaps outputting large stacktraces) could throw off HTTP throughput calculation.
The max_per_source
property is configured once for Log Cache as a whole, and should be sized to be larger than your noisiest application.
As the number of requests on the platform increases the time window used to determine the number of requests will narrow.
If you have ensured that your Log Cache max_per_source
is sufficiently large but are still seeing incorrect throughput calculated you may need to scale your logging pipeline or other components in your foundation. See the Additional Information section below for instructions.
VMware Tanzu Support recommends that you load test your specific application to confirm that Autoscaler will make correct scaling decisions given the platform configuration.
ok = # removed up to position (max_per_source - (application_requests_per_second * metrics_per_request * (scaling_interval - 5))) > # position autoscaler reads up to (min(application_requests_per_second * metrics_per_request * metric_collection_interval, max_per_source) - ((scaling_interval - 5)/log_cache_request_time) * 1000 (rounded down to the nearest thousand))
The log cache CF CLI plugin provides the cf log-meta
command, which you can use to get the cache duration.
$ cf install-plugin -r CF-Community 'log-cache' $ cf log-meta Retrieving log cache metadata as admin... Source Source Type Count Expired Cache Duration ... pora application 100000 68320906 16s ...
Additional information can be found below: