Identifying and Reducing Backlog in Wavefront & DX OpenExplore

Products

Observability DX OpenExplore

Issue/Introduction

Ingestion Backlog can impact Alerting and Dashboard accuracy in your observability environment. This guide provides steps to identify the presence of a backlog and strategies to mitigate it.

Out-of-the-Box Dashboards and Alerts are provided to help monitor your proxy ingestion and alert your team when data is not arriving as expected. These can be used As-Is or cloned and modified to focused on ingestion from individual segments of your business.

Resolution

Identifying Backlog - Dashboards & Charts.

Tanzu Observability Service and Proxy Data Dashboard is an out-of-the-box Dashboard that provides visibility to Ingestion.

Broken down by sections this dashboard is used every day by Customers and Support alike to identify ingestion patterns and diagnose issues.

Proxy Overview Section
Proxy Troubleshooting Section
Ingest Rate by Source
Analyze for Unwanted Metrics
Filtering and Blocking Ingestion at the Proxy (and Operator) Level.

If you find data that is no longer needed, you can reduce that ingestion by creating filters and/or preprocessor rules.

Learn how to monitor Wavefront proxies. See Monitor Wavefront Proxies

Proxies Overview Section

This section allows you to see Proxy Backlog Sizes for Points, Histograms and Spans.

Review the "Info" section on the left for definition on the Metrics used in this section.

For additional information on these and others internal ~proxy. metrics see Article, Monitor Wavefront Proxies Section: Proxy Internal Metrics.

Other charts in this section provide details on causes of backlog for example Max Burst Rate and Queuing Reasons, ect.

Received Points/Distributions/Spans Max Burst Rate.
- See Section: Proxy Queue Reasons: Bursty Data.
Blocked Points/Distributions/Spans per Second.
Queued Points/Distributions/Spans per Second.
Queuing Reasons.
- See Section: Proxy Queue Reasons: Memory Buffer
Blocked Metrics.
- For information on logging blocked metrics for review See Section: Step 2: Enable Blocked Point Logging and Examine Blocked Points
Spans Sampled by Policies per Second.
- For more information on Spans see Manage Sampling Policies.
- Note: at this time DX OpenExplore does not support Trace data. This feature is expected to be included in a future release.

Proxy Troubleshooting Section

Monitor CPU/Memory Resources, Network latency, view Preprocessor Rules information that can impact performance on your Proxies.

Review the "Info" section on the left for definition on the Metrics used in this section.

Ingest Rate by Source

If your combined Points-Per-Second (PPS) ingestion rate is above your Collector PPS rate the additional PPS will be "push-back" to the proxy where it will be buffered until it can be resent

Reviewing the sources that are sending data to your proxy, will allow you to identify any that are sending unexpectedly high amounts.

Analyze for Unwanted Metrics:

Our Developers and Technical writers have published multiple articles to help customers identify unused metrics. Here are a few for your review.

Create, Customize, and Optimize Dashboards,
See Section: Identify Unused Dashboards.
Improve PPS Usage and Prevent Overage
See Sections:
- (Optional) Clone Namespace Explorer and Create Custom Charts.
- Drill Down with wftopt and Spy API
- Use the REST API to Compare Ingested and Accessed Metric.

Filtering and Blocking Ingestion at the Proxy (and Operator) Level.

Using YAML Files to block unwanted data through your Container Proxies - Kubernetes, Docker, Operator Deployment methodology.

Kubernetes, create a custom ConfigMap to block traffic via preprocessor rules.
- Use a Custom ConfigMap to Include Preprocessor Rules
Observability for Kubernetes Operator allows you to control metrics both at the operator level and at the Proxy level.
- By denying controlPlane and peripheral metrics at the Operator level you reduce the CPU & Memory resources needed by the proxy to handle its pre-processor rules.

Non-containerized proxies also use preprocessor rules to allow you to block metrics at the proxy level.

Note: Its best practice to regularly review and refine filtering rules to prevent unwanted metrics from contributing to high PPS.

Additional Information

For questions or further assistance, please contact Broadcom Support.