In this knowledge base article we will explore log transport and log loss in modern TAS-For-VMS loggregator firehose.
There is excellent information already available in regards to logging within TAS:
The information we will cover is supplemental to the above documentation. It is highly recommended to read the documentation first as this knowledge base article will have an assumption of basic loggregator knowledge so that we can focus specifically on log transport and log loss. In an effort to increase our understanding on these topics, we will take a unique approach starting with high level big picture overviews and elaborate down on the intricacies as necessary.
Though they can be correlated, there is a difference between log loss and slow consumers. Slow consumers can cause log loss, but log loss can occur even with highly optimized and properly scaled consumers. This knowledge base article aims to bridge that gap in understanding.
Before we begin, it should be known that this knowledge base article will cover a lot of material. It is less of a "technical deep dive" and more of a "low altitude flight" over the landscape that is loggregator firehose. There are a number of concepts that we will cover and the goal is to present them with sufficient detail in ordered digestible chunks so that approaching log loss can be done with strategy.
The outline of this knowledge base article is listed below:
Purpose
The purpose for this material is to empower TAS operators with knowledge so that they can make intelligent and informed decisions while troubleshooting log loss scenarios. Additionally this material will shine light on key information that will be incredibly relevant when making scaling decisions for an environment's loggregator firehose setup.
There is no "one size fits all" approach to scaling loggregator firehose. Each environment is different and has different needs. Not only do environments and environmental needs differ, but TAS itself may also bring differences. For example, starting in TAS v2.13 Log Cache is no longer part of the Doppler instance group.
There are a lot of moving pieces regarding the firehose and it can seem overwhelming at first, but it doesn't have to be. Where do we start? How do we begin learning all of this? One fundamental at a time.
We will focus on fundamentals. We want to understand the travel path of a log and the components responsible for that transport. We want to understand the internal buffers used by components and their communication protocols. We want to understand what happens "under the hood" when a nozzle connects to the firehose and what happens when that nozzle is not ingesting as fast as the firehose is dispensing.
The Big Picture
The loggregator firehose is one of the ways TAS moves logs and metrics through the platform and it provides an API to allow access to these logs and metrics. See the following picture for a high level visual representation of this concept:
Logs and metrics egress from BOSH deployed VMs and continue making their way through the firehose architecture. There are various components within the firehose that we will discuss shortly, but for now let us keep that part abstract. At the end of the firehose, there is an API provided that an authorized consumer application (also known as a nozzle) can "hook up" to and request a continuous stream of select logs and metrics from the platform. What that consuming application does with the logs and metrics is completely up that consuming application's logic. For example, there may be a nozzle that accepts all logs and forwards them downstream to another server to be processed for long term persistence. A nozzle can be located internal or external to the environment and can be coded to do anything with the logs and metrics it receives. This is a very important point worth elaborating further and we will revisit this topic later in this KB but for now just know that a nozzle accepts the incoming stream of logs and metrics and performs action on each one individually.
Knowledge Checkpoint
The firehose architecture is designed to be a system that infinitely transports logs and metrics through the TAS environment and it allows access to these logs and metrics by providing an API. A nozzle is an authorized consuming application that connects to this API and streams select logs and metrics to perform action on them.
Logs and metrics come from a variety of sources such as applications and platform components and they flow through the loggregator firehose in what are called envelopes. There are a variety of envelope types. For example, a Log envelope, a Counter Event envelope, a Gauge envelope.. etc. More information about envelopes can be found here. Another term synonymous with envelope is message. (Message is actually a term derived from Protocol Buffers - which gRPC uses to describe the envelope's structure.) For now, just know that an envelope, message, and log typically represent the same thing within loggregator however envelope is more semantically correct. The "loss" in "log loss" is also slightly misleading because the envelopes aren't technically "lost", they are purposely dropped to prevent back pressure within loggregator. "Log loss" is more accurately defined as "dropped envelopes".
Several components and protocols are involved in the movement of envelopes, each with its own specific purpose. Additionally, some components make use of different log buffering strategies that come with different behavior for how they deal with back pressure. We will spend some time going in depth on all of this but for now lets highlight the reason this is important to us. When investigating dropped envelopes within loggregator, it helps to know exactly where an envelope can drop at. By understanding the travel path of an envelope along with the buffers and the protocols used in that travel we can visualize how an envelope makes its way from the start of the firehose to the destination consumer application and identify points of contention that may result in envelope drops. Identifying where envelopes are dropping is one of the first tasks to do when investigating log loss. There are metrics available that we can use to trace this activity and we have an entire section later in this KB devoted specifically to these metrics, but for now lets continue to learn more about envelope movement.
Envelope Travel Path
See the following picture for an overview of loggregator components responsible for transporting envelopes through the firehose.
Loggregator is an umbrella term that encompasses logic for how a TAS environment moves and provides logs and metrics throughout its ecosystem. There are several architectures available within loggregator and one of them is the firehose architecture. Another one is the shared nothing architecture. A collection of agents make both architectures possible. Technically, only one agent is needed to transport envelopes from source VM to the firehose and that is the loggregator_agent (formerly known as metron, Red Star #2).
Note - BOSH deployment's choose the loggregator architectures they want to participate in via job specifications within their manifest. It is possible you may encounter BOSH deployments that have all agents or only the loggregator_agent or even no agents at all as participation in loggregator architectures is TAS dependent and optionally defined by deployment authors.
Red Star #1 - The loggregator-forwarder-agent job is the first hop in loggregator for an envelope if the deployment is participating in both architectures. The forwarder agent receives envelopes from various sources on the VM and forwards a copy of each envelope to its downstream subscribers. By default there are only two subscribers, the loggregator-syslog-agent and the loggregator_agent. The copy that is sent to the loggregator-syslog-agent is destined for the shared nothing architecture, and the copy that is sent to the loggregator_agent is destined for the firehose architecture.
Red Star #2 - The loggregator_agent is the second hop for an envelope as it continues making its way through the loggregator firehose. It is worth noting that if a deployment is not participating in the shared nothing architecture but is participating in the firehose architecture, then the loggregator_agent will be the only agent on the VM and will be the first stop for an envelope. The loggregator_agent opens up 5 connections to random dopplers with availability zone preference and will round robin envelopes through each connection.
Red Star #3 - The doppler job is the third hop for envelopes in the firehose architecture journey. It is responsible for accepting envelopes from many different loggregator_agent connections and routes them downstream based on select criteria.
Red Stars 4a and 4b are better understood if a high level understanding of v1 and v2 loggregator firehose is known. To review this, see the section at the end of this KB titled v1 and v2 Loggregator Firehose.
The potential fourth hop for an envelope in its journey through the firehose is one of the firehose API jobs.
Red Star #4a - The loggregator_trafficcontroller job is the v1 API for the firehose. Envelopes only get routed here if there are v1 nozzles present and active. If there are no v1 nozzles present, then envelopes never make it here. The reason envelopes do not make it this far by default is because loggregator_trafficcontroller only acts on behalf of nozzles. When a v1 nozzle (Red Star #5a) connects to v1 API, the loggregator_trafficcontroller will, on behalf of the nozzle, open a connection to every doppler so that it can proxy a path for envelopes to flow from dopplers to the nozzle.
There are 2 v2 APIs, internal and external. We will cover the differences in more detail shortly but for now just know that "internal" means that a nozzle must be internal to the network to access the API and "external" means nozzles can be external to the network and still access the firehose.
Red Star #4b - The reverse_log_proxy job is the internal v2 API for the firehose. Similar to the v1 API, envelopes will only get routed here if there is a v2 nozzle present (Red Star #5b or #5c). This job can only be accessed within the network due to the protocol it uses. To expose the v2 API external to the network, a gateway to the reverse_log_proxy was created.
Red Star #4c - The reverse_log_proxy_gateway is the external v2 API for the firehose. An external v2 nozzle (Red Star #5c) connects to the reverse_log_proxy_gateway and the gateway on behalf of the nozzle will connect to the reverse_log_proxy.
Knowledge Checkpoint
The firehose is a chain of services that work together to infinitely transport envelopes at scale offering ad hoc access to filterable envelope streams via APIs.
To keep the firehose system churning, each of these listed components leverage some kind of internal buffer to temporarily store envelopes as it processes each one for the next hop in the firehose.
Buffers Used By Firehose Components
A TAS ecosystem may potentially consist of hundreds of VMs that can collectively generate millions of envelopes per minute therefore the firehose is architected to be a system that aims to support high envelope throughput. In an effort to optimize this high throughput of envelopes, the components make use of buffers for flow control. Buffers act as "holding places" for envelopes while within a component. There are two buffers that are important to understand, diodes and golang channels. These two buffer types have more similarities than differences in regards to their role within firehose so will discuss them together.
Envelopes get written to the buffer for temporary storage and then they are read from the buffer for processing. As long as the read side keeps up with the write side, continuous flow is observed. If the read side is consistently slower than the write side, eventually the buffer will reach max capacity. The main difference between the two buffer types is how they behave when reaching max capacity.
A diode may drop all envelopes in the buffer if a new write is attempted while at max capacity. A golang channel will keep envelopes in the buffer while at max capacity but block new writes. Example, if a diode with a max capacity of 10,000 envelopes gets 10,000 envelopes written to it, then writing the 10,001st envelope to the diode may result in all 10,000 envelopes that are currently within the diode dropping as to make room for the new envelopes. In contrast, if a golang channel has a max capacity of 10,000 envelopes and 10,000 envelopes get written to it, then writing the 10,001st envelope can not take place as the buffer will block future writes until the buffer is read from.
Note, a diode is a ring buffer and ring buffers normally drop only one object (the oldest) at a time to create room for new objects. However, due to how active firehose diodes are we typically see all envelopes within the buffer drop to create room for the new envelopes. The README states: "When the diode notices it has fallen behind, it will move the read index to the new write index and therefore drop more than a single message."
Firehose components and buffers are coordinated together to allow for continuous envelope flow. Additionally, there are mechanisms in place to handle the scenario for when a buffer reaches max capacity. These mechanisms consist of dropping envelopes as to not create back-pressure within the firehose. If the buffer is a diode, then the envelopes within it will drop. If the buffer is a golang channel, then the component hosting the golang channel itself will drop the envelopes upon failure writing to the buffer.
Protocols Used By Firehose Components
Firehose components communicate with each other within the platform over gRPC. At the time of this writing gRPC leverages Protocol Buffers and communicates via http/2 based transport over TCP. It is highly scalable and offers speed and reliability in distributed systems. We will not dive too deep into gRPC's lower level workings however we can summarize its importance within the firehose architecture. gRPC essentially allows "firehose component x" to request an infinite "stream" of messages from "firehose component y". Recall that message is a Protocol Buffers term. A message is an object defined by a message definition file, known as a proto file. In loggregator firehose, these message definitions include an envelope.proto, see here for v2 proto definitions. One key benefit to Protocol Buffers is its ability to serialize structured data into smaller binary payload. A message is structured data due to the proto file definition. When both the client and server have this proto file implemented, they then understand the message definition and can communicate via serializing and deserializing messages. All firehose components from the agents to the API jobs implement gRPC for envelope transport.
The firehose API jobs however offer other protocols for connecting consumers as summarized below:
Knowledge Checkpoint
Firehose components move logs and metrics via streams of serialized structured messages (envelopes) over gRPC. The components leverage internal buffers to maximize flow control efficiency while moving these envelopes.
Diving Deeper with Walkthrough Example
Now that we have initially covered envelope flow and buffering behavior, let us explore a simplified scenario accompanied with a series of pictures to help solidify our understanding.
Imagine this is a fresh TAS v2.11 install where there is only one AZ, one Doppler VM, and one Traffic Controller VM and no nozzles present. TAS, depending on version, may actually initially deploy with a few nozzles such as log-cache-nozzle but for this exercise, let us imagine all nozzles are stopped and not consuming from the firehose.
How envelope flow look in this scenario:
Component jobs within Bosh Deployed VMs generate logs and metrics which become envelopes that ingress into to the loggr-forwarder-agent job. The way the components get envelopes to the loggr-forwarder-agent varies depending on the job itself and how its setup, and we will not dive too deep into this aspect but a diagram of this can be found here and here (example cloud controller utilizes statsd injector to get envelopes to forwarder agent whereas the diego executor sends directly to forwarder agent and gorouter leverages udp-forwarder). Our focus starts with the envelopes that ingress into the loggr-forwarder-agent. Notice the black dots on the sides of the firehose components in the above picture, these dots represent sockets.
The loggr-forwarder-agent ingests envelopes from its upstream log provider (statsd injector, udp-forwarder, or executor..etc) and places them in its internal diode (write side of buffer). An additional process within the loggr-forwarder-agent infinitely reads envelopes from the diode (read side of the buffer) and forwards copies of each envelope to each of its interested downstream consumers. From the perspective of the loggr-forwarder-agent, the interested downstream consumers are the loggr-syslog-agent (Shared Nothing Architecture, not pictured above) and the loggregator_agent (Firehose Architecture).
This is a good time to address how "consumer" is an overloaded term in regards to firehose concepts. A "consumer" or "consuming application" can be synonymous with nozzle. Additionally an "interested downstream consumer" can be a next "hop" in the firehose architecture. The upstream log-provider is the previous "hop" in the firehose architecture.
The loggregator_agent ingests envelopes from its upstream log provider (the loggr-forwarer-agent) and places them in its internal diode. Another process within loggregator_agent infinitely reads envelopes from the diode and forwards copies to each interested downstream consumer. From the perspective of the loggregator_agent, there is only one downstream consumer - doppler. This means loggregator_agent does not make copies of the envelopes, as it only has one downstream consumer. Recall from Red Star #2 earlier that loggregator_agent opens up 5 connections to random dopplers with availability zone preference. Since our current example only has 1 doppler VM, all 5 connections go to that single doppler. If there was more than one doppler per availability zone in this environment, these 5 connections would be spread among the different dopplers in an effort to "spread the load".
This is a very subtle but important concept to reiterate as it applies to other areas within the firehose as a general concept:
To help with understanding doppler we will approach it as one whole component comprised of two parts, ingress and egress. Doppler as a whole is the central "envelope router" for firehose. Doppler ingress is always active, even when there are no nozzles in the environment. Doppler egress is only active when nozzles are consuming from firehose API jobs.
Doppler ingress ingests envelopes from its upstream log providers (many different loggregator_agents from many different BOSH deployed VMs) and places them in its internal ingress diode. Another process within doppler infinitely reads envelopes from the ingress diode and forwards copies to each interested downstream consumer located in doppler egress. From the perspective of doppler ingress, there can be many different interested downstream consumers, referred to as subscriptions.
A subscription may have multiple instances or it may be individual. This means the envelopes that doppler ingress reads from its ingress buffer may be copied across the multiple downstream consumers and/or sent round robin style among those consumer instances. If there are no nozzles in the environment (like our example we are currently discussing) then an envelope's last stop occurs after it is read from the ingress diode.
For completeness, doppler has two ingress diodes (v1 and v2). Doppler performs logic on each envelope before placing it in the ingress diodes as described in the section v1 and v2 Loggregator Firehose at the end of this KB however it isn't necessary to articulate the differences of v1 and v2 ingress at the moment. v1 and v2 ingress are the same for the most part in regards to the current topic. We will cover subtle differences as necessary but for now we will generalize this topic as doppler ingress instead of doppler ingress v1 and doppler ingress v2.
Doppler egress is where the "plumbing magic" happens for nozzles. Before covering doppler egress, let us take a brief logical look at what it means to be a nozzle from the perspective of loggregator firehose. A nozzle connects to a firehose API and subscribes to a stream of select envelopes. Internally within firehose, a "subscription" is associated with each nozzle instance. Each subscription has a "Subscription ID", also known as a "Shard ID". Firehose components are able to keep track of each and every nozzle based on subscription Shard IDs.
Continuing with the example environment, let us now introduce two different v1 consumers. One has a Subscription ID with value "my_shard_id_1" (light blue) and the other has Subscription ID "my_shard_id_2" (purple).
When a nozzle connects to a firehose API, it will get its own subscription within doppler egress (and other firehose components where applicable). A nozzle's subscription within doppler egress gets a diode of size 1,000.
Let's remove those two v1 nozzles and replace them with two different internal v2 nozzles. One has a Subscription ID with value "my_shard_id_3" (lime green) and the other has Subscription ID "my_shard_id_4" (red).
Notice how the v2 firehose API job (reverse_log_proxy) creates a size 10,000 envelope golang channel buffer within the subscription it manages for the nozzle instance but the v1 firehose API job (loggregator_trafficcontroller) does not. Technically, the v1 firehose API job does make use of golang channel buffers as well but not in a way we are interested in. We will not cover the internal v1 firehose API channel buffers as we are more interested in envelope buffers and not byte array buffers.
There is only one v2 firehose, but there are two APIs for v2 firehose. One API is only accessible from within the network due to its limitation to only communicate via gRPC and the other API is exposed so that it is accessible from outside of the network. The reverse_log_proxy is the internal v2 API and the reverse_log_proxy_gateway is the external v2 API. The reverse_log_proxy_gateway leverages the internal v2 API to proxy the connection to v2 firehose clients external to the network. Let's remove the two internal v2 API nozzles and introduce one external v2 API nozzle. This external v2 nozzle has a Subscription ID with value "my_shard_id_5" (navy blue)
The reverse_log_proxy_gateway does not need to be aware of subscriptions as it is simply a gateway proxy to the reverse_log_proxy.
Let's remove all nozzles and scale to two doppler VMs.
Notice how the loggregator_agent has two connections to the top doppler and three connections to the bottom doppler. This is a prime example of "one interested downstream consumer with multiple instances". From the perspective of loggregator_agent, the only downstream consumer is doppler. There are multiple dopplers, so loggregator_agent is going to round robin envelopes it pulls from its buffer among those 2 instances. This is a technique the firehose uses to "spreads the load". Let's connect a nozzle to an api job. For the purpose of this concept, we don't need to specify v1 or v2. This nozzle has a Subscription ID with value "my_shard_id_6" (gold).
Notice how within each doppler there is a subscription for the single nozzle instance with Subscription ID "my_shard_id_6". Whenever a nozzle instance connects to the firehose, the firehose API job will, on behalf of the nozzle instance, open a connection to every doppler in the environment. Each doppler will in turn create a subscription for the nozzle instance. The reason for this logic can be appreciated if we consider that the bosh deployed VM on the very left is a diego cell VM. Imagine a theoretical app, appA, is located on that diego cell and it is logging constantly. The nozzle instance with Subscription ID "my_shard_id_6" is interested in all logs from appA. Considering that the loggregator_agent on the diego cell opens five connections to random dopplers, this means that the logs from appA can be routed to any doppler. To ensure that the nozzle instance has a network path to receive all logs from appA, the firehose API job must open connections to each doppler.
Earlier we mentioned "Doppler egress is where plumbing magic happens for nozzles", Let's see this in action by extending the appA example. First we will scale back to one doppler instance and scale our nozzle with Subscription ID "my_shard_id_6" to two instances. Please note that some components and text are omitted from the following picture for brevity. Our main focus is how appA's logs pass through the firehose components. We have four logs from appA represented by the purple circles. We want to visualize how each log (1,2,3,4) traverse the different firehose components. Additionally we have added logic to our nozzles to forward all logs it receives to a long term persistence solution. Recall that a nozzle can be coded to do anything with the logs and metrics it receives, in this case the nozzles forward to a database solution.
The two subscriptions in doppler egress have the same Subscription ID so doppler will know its a single consumer with two instances. Doppler will therefore have two subscription buffers it can use to place appA's envelopes in round robin style.
Continuing the current example, lets scale up to two doppler VMs and two Traffic Controller VMs.
Loggregator_agent now has two dopplers it can connect with, so it does. Two connections went to the top doppler and three went to the bottom doppler. This is how appA's logs arrive at different dopplers. The two nozzle instances connected to firehose API jobs on two different Traffic Controller VMs. Instance 0 connected to the top Traffic Controller VM (dotted line to separate each instance 0 connections where applicable), and instance 1 connected to the bottom Traffic Controller VM.
This wraps up the example walkthrough. We covered various internal behavior as different nozzle types come and go with variety of small platform scaling scenarios. These concepts apply at the fundamental level and remain true as the platform expands.
There is a piece of logic that has been omitted from all pictures so far, but should be covered quickly. Its simply that v1 nozzles (websockets) and external v2 nozzles (long lived http connections) must go through network hops such as a load balancer and gorouters to get to their firehose API jobs. Internal v2 nozzles connect directly to its firehose API job. See the following picture:
Knowledge Checkpoint
The firehose architecture leverages a subscription based approach to track each nozzle instance and isolate the different envelope streams. Horizontally scaling a consumer (whether it be a nozzle or interested downstream consumer firehose "hop") results in more instances of that consumer which yields more pathways and buffers to disperse envelopes to.
Metrics and Concepts
In this section we will cover a variety of metrics and concepts that are relevant to tracking firehose performance.
The main firehose Key Performance Indicators (KPIs) are documented here and here. We will briefly review them as well with additional concepts surrounding the metrics.
Important Note
Counter envelopes are emitted with a "total" field and a "delta" field. The delta field in the envelope is softly deprecated and should not be relied upon. Instead, implement some form of rate and/or change in value logic within your observability system to generate the values that are relevant to your observability strategy. For example, a promql expression to generate the rate for doppler egress dropped metric may look like the following:
rate(dropped{source_id='doppler', direction='egress'}[1m])
A lot of the metrics we will be reviewing are counter envelopes.
Origin: loggregator.doppler
Name: ingress
Concepts:
Origin: loggregator.doppler
Name: dropped
Tag: direction=ingress
Concepts:
Origin: loggregator.doppler
Name: dropped
Tag: direction=egress
Concepts:
2022-08-18T15:55:29.850182681Z Dropped 1000 envelopes (v1 buffer) ShardID: ThrottleNozzle
Origin: loggregator.doppler
Name: subscriptions
Concepts:
Origin: loggregator.rlp
Name: ingress
Concepts:
Origin: loggregator.rlp
Name: egress
Concepts:
Origin: loggregator.rlp
Name: dropped
Concepts:
Origin: loggregator.rlp
Name: subscriptions
Concepts:
Origin: loggregator.trafficcontroller
Name: doppler_proxy.slow_consumer
Concepts:
2022-08-18T15:55:34.977144913Z Doppler Proxy: Slow Consumer from 10.225.28.139:1502 using /firehose/ThrottleNozzle
Origin: loggregator_forwarder_agent
Name: dropped
Concepts:
Origin: loggregator.metron
Name: dropped
Concepts:
Origin: system_metrics_agent
Name: system_load_1m
Concepts:
Origin: system_metrics_agent
Name: system_mem_percent
Concepts: