Tuning Autoscaler HTTP throughput rules and Loggregator to run in successfully

search cancel

Tuning Autoscaler HTTP throughput rules and Loggregator to run in successfully

book

Article ID: 297416

calendar_today

Updated On: 03-13-2025

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

The Autoscaler that is a part of VMware Tanzu Application Service (TAS) for VMs operates by loading metrics from Log Cache. It is important that the logging system is tuned properly or Autoscaler may lack the information required to make correct autoscaling decisions, especially when scaling by throughput.

The Autoscaler can under-report the number of requests per second when the logging and metric envelopes generated exceed the configuration of the Log Cache property max_per_source.

This article will explain how to check if the log and metrics volume from your application is capable of being handled by Log Cache in your environment or if Loggregator will need to be scaled up.

Resolution

The Process

To diagnose whether Autoscaler will make proper scaling decisions you can compare the log cache metric duration to the metric collection interval.

Run cf log-meta and mark down the value under "Cache Duration" for your app.
Find the metric collection interval defined for Autoscaler. This is in Ops Manager under the TAS Tile.
If the Cache Duration from step number one is less than the metric collection interval for Autoscaler from step number then then Autoscaler is likely under-scaling your application.

To confirm, you can see the exact request rate that Autoscaler has calculated by running cf autoscaling-events APP_NAME.

Digging Deeper

For each request made to an application on the platform a number of logging and metric envelopes are generated. Log Cache stores log and metric envelopes by source-id. For an application, the source-id is the application GUID.

As of TAS for VMs 2.9, for each request to an application, the source-id will have the following envelopes recorded:

1 server HttpStartStop metric - Generated by the Go Router
1 client HttpStartStop metric - Generated by the Diego Cell
1 envelope per application log line - This is application-specific, some applications may generate multiple envelopes

Log Cache supports the configuration of a maximum number of envelopes to retain in RAM per source-id: the maximum number is the same for all source-ids.

This value is configurable in TAS on the Advanced Features form, under the description Maximum number of envelopes stored in Log Cache per source (referred to here as max_per_source). It defaults to to 100,000 envelopes.

Once Log Cache has reached the envelope limit for a given source id, each new envelope will result in the oldest envelope in the cache for that source id being removed. Autoscaler may under count HTTP requests when it reads from a starting timestamp and then log cache removes envelopes after that timestamp.

For Autoscaler to function correctly it is important that the max_per_source be sufficiently large to allow a contiguous stream of envelopes to be retrieved. This has implications on how you scale your Doppler instance group (see Additional Information section below).

The Autoscaler has two configurable properties (configurable via the TAS App Autoscaler form) that are relevant when calculating the appropriate size for the cache:

metric_collection_interval - The size of the window of metrics that App Autoscaler uses to make scaling decisions (in seconds). This defaults to 120 seconds.
scaling_interval - How frequently App Autoscaler evaluates an app for scaling. This defaults to 35 seconds.

Other variables:

application_requests_per_second - The number of HTTP requests to the application per second
log_cache_request_time - The time taken for Autoscaler to request a single page from Log Cache
max_per_source - The maximum number of envelopes Log Cache will hold for an individual application
metrics_per_request - The number of envelopes generated for each HTTP request. Some envelopes are generated by the platform and others are generated for each application log line.

The Autoscaler starts reading at the more recent of either the metric collection interval seconds ago or the oldest envelope in the cache.

Log Cache should be configured with a value for max_per_source that allows the age of the oldest timestamp to be greater than the metric collection interval so that all the metrics can be fetched before they are removed.

The following expression must be true for Autoscaler to accurately calculate the HTTP throughput.

ok =
  # removed up to position
  (max_per_source - (application_requests_per_second * metrics_per_request * (scaling_interval - 5)))

  >

  # position autoscaler reads up to
  (min(application_requests_per_second * metrics_per_request * metric_collection_interval, max_per_source) - ((scaling_interval - 5)/log_cache_request_time) * 1000 (rounded down to the nearest thousand))

This formula is a bit complicated, so to make evaluation easier you can use this prepared calculator . The following is also an example to walk through the calculation.

Given the following values:

log_cache_request_time = 0.4329
max_per_source = 850,000
metric_collection_interval = 120
metrics_per_request = 3
application_requests_per_second = 2000
scaling_interval = 35

Then

ok =
  # removed up to position
  850,000 - (2000 * 3 * (35 - 5))

  >

  # position autoscaler reads up to
  (min(2000 * 3 * 120, 850,000) - ((35 - 5)/0.4329) * 1000 (rounded down to the nearest thousand))

ok =
  # removed up to position
  850,000 - 180,000

  >

  # position autoscaler reads up to
  (min(720,000, 850,000) - ((35 - 5)/0.4329) * 1000 (rounded down to the nearest thousand))

ok =
  # removed up to position
  850,000 - 180,000

  >

  # position autoscaler reads up to
  720,000 - 69,000

ok =
  # removed up to position
  670,000

  >

  # position autoscaler reads up to
  651,000

ok = true

Note that this expression assumes metrics_per_request is constant. An application that encounters bursts of log lines (perhaps outputting large stacktraces) could throw off HTTP throughput calculation.

The max_per_source property is configured once for Log Cache as a whole, and should be sized to be larger than your noisiest application.

As the number of requests on the platform increases the time window used to determine the number of requests will narrow.

If you have ensured that your Log Cache max_per_source is sufficiently large but are still seeing incorrect throughput calculated you may need to scale your logging pipeline or other components in your foundation. See the Additional Information section below for instructions.

VMware Tanzu Support recommends that you load test your specific application to confirm that Autoscaler will make correct scaling decisions given the platform configuration.

ok =
  # removed up to position
  (max_per_source - (application_requests_per_second * metrics_per_request * (scaling_interval - 5)))

  >

  # position autoscaler reads up to
  (min(application_requests_per_second * metrics_per_request * metric_collection_interval, max_per_source) - ((scaling_interval - 5)/log_cache_request_time) * 1000 (rounded down to the nearest thousand))

The log cache CF CLI plugin provides the cf log-meta command, which you can use to get the cache duration.

$ cf install-plugin -r CF-Community 'log-cache'
$ cf log-meta
Retrieving log cache metadata as admin...
Source                         Source Type  Count   Expired   Cache Duration
...
pora                           application  100000  68320906  16s
...

Additional information can be found below:

Feedback

thumb_up Yes

thumb_down No