Consuming logs/metrics through the external V2 Firehose API show reduced performance
search cancel

Consuming logs/metrics through the external V2 Firehose API show reduced performance

book

Article ID: 297461

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

The Loggregator Firehose, which exposes application logs and application or component metrics, is accessible through a few different APIs:
  • The V1 Firehose API (wss://doppler.SYSTEM_DOMAIN), which is  provided by the Traffic Controller. Historically, this has been used by most integrations.
  • The internal V2 Firehose API (over gRPC), which is provided by the Reverse Log Proxy (RLP). This has been present since at least Tanzu Application Service (TAS) for VMs 2.0, and is used by many internal components such as Log Cache, Healthwatch and Syslog Adapters.
  • The external V2 Firehose API (https://log-stream.SYSTEM_DOMAIN), which is provided by the RLP Gateway. This was added in TAS for VMs 2.4, and some integrations have switched to it.
You may experience the following issues with integrations that use the external V2 Firehose API:
  • Overall throughput is lower - considerably fewer logs/metrics per second are able to reach their destination with the same amount of system resources.
  • High CPU and memory usage on the nozzles/firehose consumers and the Traffic Controller VMs.
  • Loss of logs when clients disconnect. This occurs, at minimum, every 14 minutes as the RLP Gateway refreshes connections. This loss will be exacerbated by any additional load balancers in front of the foundation's Gorouter.
The first two issues are due to the conversion to and from JSON involved in logs/metrics transport through the RLP Gateway.

Resolution

Workaround

At this time, there is not a direct solution to these throughput and resource usage concerns. We recommend that:

1. Since Tanzu Application Service (TAS) for VMs 2.8, aggregate log and metrics drain destinations feature is supported. App logs and metrics can be forwarded from Diego Cells to external logging management system directly, dopplers / traffic controllers are not required or can be used for log-cache. In this way, and bottlenecks or performance issues introduced by dopplers and traffic controllers can be resolved from the root. 

2. Return to using the V1 Firehose API. This API was previously planned to be deprecated, but will now be available in all versions of TAS4VMs going forward.

3. Use the internal V2 Firehose API via gRPC, as per the following example: firehose-nozzle-v2 / rip /:

  • Ensuring that the Application Security Groups in place on the platform allow applications to contact the Traffic Controller VMs.
  • Getting Mutual TLS credentials, rather than OAuth credentials. See this example for generating these for tile-deployed apps: firehose-nozzle-v2 / rip-tile /
  • Switching from the RLPGatewayClient to the EnvelopeStreamConnector.