Application logs contain 'connection errors' and 'dropped envelopes' to internal TAS components, such as Doppler.
search cancel

Application logs contain 'connection errors' and 'dropped envelopes' to internal TAS components, such as Doppler.

book

Article ID: 413747

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

Your application that is deployed on TAS/TPCF is throwing errors similar to below:

Error occurred while processing stream: finishConnect(..) failed: Connection timed out: doppler.<EXAMPLE.com:1234>
Invalid handshake response getStatus: 404 Not Found
233 recvAddress(..) failed: Connection reset by peer
-----------
Dropped 1000 envelopes (v1 buffer) ShardID: <xyz>
Unable to connect to doppler (xx.xx.xxx.xxx): rpc error: code = Canceled desc = context canceled

Cause

Applications deployed to TAS generally do not connect to internal TAS/TPCF components, such as Doppler. 

Since there are connection errors to TAS components in the application logs, this indicates the application is indeed actively scraping logs and is functioning as a Firehose-like consumer. Thus, the errors seen in the app logs relate directly to the limits outlined in this KB Article.

Apps consuming volumes of Telemetry (like in this scenario) through RLP/Doppler will be impacted when platform safeguards kick in to maintain performance, including dropped logs and disconnections.

Resolution

Regarding the "Dropped Envelops" reported from Doppler VM, this is normal behavior built-in to ensure platform health and availability (esp under high log volume scenario).  Custom applications deployed to TAS thats purpose is to effectively scrape and consume large volumes of log/metric data via the RLP Gateway (ie /firehose/v2), presents a workload that is particularly sensitive to the throughput and connection limits inherent in the Loggregator/Doppler architecture. 

The official recommendation is to send the logs to a remote server, or use VMware's TPCF OpenTelemetry Collector for TAS for VMs to egress traces, metrics, and logs, versus using a custom log-scraping application.

Additional Information

1.) When the RLP gateway or Doppler instances are overwhelmed by sustained burst traffic, they will start shedding load by:

      • Dropping envelopes to preserve Doppler availability
      • Temporarily disconnecting Firehose consumers (like your app) to free up resources
      • Logging events, such as  "dropped messages" and/or "connection closed" as indication that the system (TAS/platform) is actively managing overloaded conditions. 

2.) Helpful Information:

  • V1 logging uses wss://doppler.SYSTEM_DOMAIN, which resolves via external DNS, passes though the Gorouter and load-balancer, and reaches Doppler VM's.
  • The doppler.service.cf.intenal address is internal (BOSHDNS) and typically used by platform components or nozzles - bypassing the Gorouter altogether.
  • Port 8081 is consistent with Loggregator V1, but the endpoint (doppler. VS log-stream.) and protocol (WebSocket VS gRPC) are what define the Firehose version -- not the DNS label.