How does the Loggregator Work in Tanzu Platform Cloud Foundry

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

This article applies to all versions of Tanzu Application Service up to 2.12.x. Beginning with TAS 2.13, log cache was broken out as its own instance group. See this KB for more detail on the new architecture of log cache and how it fits into the Loggregator system.

This article discusses how the Loggregator works in Tanzu Platform Cloud Foundry.

Loggregator simply transports log messages from the stdout or stderr of an app to the endpoint of Cloud Foundry logging, namely the firehose. Persistence of the logs is the responsibility of whatever consumes the logs. For example, aggregators such as ELK stacks, Splunk, or simply cf logs.

Log messages not immediately extracted and persisted, are discarded. The exception is the small number of logs stored in a buffer and are available through cf logs --recent.

Resolution

The Loggregator Components

Loggregator transports log messages via a chain of components from applications to either the firehose or a Doppler syslog drain. Various components of the chain are scaled horizontally as necessary for load.

These components fulfill the following functions:

Metron:

Provides the entry point to the Loggregator chain on each VM that has either logs or metrics to send through Loggregator. It accepts messages from clients and groups them into full buffers for efficient transport and load balances by randomly selecting a different Doppler for each buffer transmission.

Doppler:

The doppler is the primary transport mechanism. It provides load balancing store and forwarding of log messages. It supports outputting to either:

A Syslog drain on a per-app basis from all Dopplers in parallel via a custom provisioned service binding or
An aggregating Traffic Controller

It provides a small amount of per-app buffering, typically 100 messages, to support cf logs --recent

Traffic Controller:

“Re-aggregates” the parallel log streams transported by the Dopplers. It provides endpoints for clients (Nozzles or the CLI) to request output of either

All logs and metrics - this is the Firehose.
All logs and metrics for a single app - sometimes called the Garden hose

On request, the Traffic Controller opens parallel web sockets to all Dopplers requesting either the logs of a single app or the complete log stream. For each request, the Traffic Controller maintains the separate aggregated streams until the client releases the connection.

Nozzles:

Provide the mechanism for transporting log and metric data from the Firehose to the downstream aggregating system of choice (such as ELK, Splunk, Datadog, etc.). They read in messages from the Firehose in the custom Loggregator dropsonde protocol and convert messages into the appropriate downstream format, such as syslog. They can also perform other functions such as hydrating the log message with App ID, based on the included app guid. There are several solutions in the Cloud Foundry OSS community for targets such as Syslog, Graphite, and Kafka.

Additional Information

Loggregator interacts with other components to support the above chain:

etcd
syslog_drain_binder
CAPI
UAA

Impact

Please note that the Loggregator can lose messages depending on the number of messages it handles.

Please see this article for more information.

"Log messages" is used here as a proxy term for all messages transported by Loggregator, including metrics.