Prometheus api call for pool analytics returns 200 response without any pool data

Products

VMware Avi Load Balancer

Issue/Introduction

When accessing the Prometheus pool analytics, the API response received is a 200 HTTP Response code without any pool data.

Environment

Avi Load Balancer and Prometheus

Cause

Scalability Issue: The current architecture exhibits limitations in efficiently gathering metrics data when the number of configured pools is large.
API Timeout: The fixed timeout duration between the AnalyticsPortal and the metricsapi_server is 30 seconds. This duration is insufficient for completing metrics collection operations when a high number of pools are present.
Non-Configurable Parameter: As of the current release [31.1.x], the API timeout value is a static, hard-coded parameter within the controller software and cannot be adjusted by the user or administrator.

Resolution

Mitigating API Timeout for Prometheus Metrics Retrieval from Avi Controller

1. Introduction

This document outlines a resolution to address the API timeout issues encountered when retrieving Prometheus metrics data from the Avi Controller, particularly in environments with a large number of configured pools. The current fixed API timeout of 30 seconds between the AnalyticsPortal and the metricsapi_server often proves insufficient for comprehensive data collection from numerous entities.

2. Problem Statement Review

As previously identified, the primary challenge is that fetching metrics data for a high volume of pools exceeds the default 30-second API timeout. This limitation results in incomplete or failed metrics retrieval, impacting the visibility and monitoring capabilities for large-scale deployments. The lack of a configurable timeout parameter on the controller exacerbates this issue.

3. Resolution: Chunked API Calls for `entity_id` Filtering

The proposed resolution involves making multiple, segmented API calls to the Avi Controller's Prometheus metrics endpoint. Each call will leverage the entity_id query parameter to filter the requested data, with a strict limit on the number of entity_id values included in a single request.

3.1. Approach Overview

Instead of attempting to retrieve metrics for all pools in a single, potentially time-out-prone API call, the total set of pool UUIDs will be divided into smaller batches. A separate API request will then be made for each batch, ensuring that the number of entity_id parameters in any given URL remains below the threshold that triggers a timeout.

3.2. API Endpoint Format

The API calls should adhere to the following format:

https://<controller_ip>/api/analytics/prometheus-metrics/pool?entity_id=<pool_uuid1>,<pool_uuid2>,...,<pool_uuidN>

Where:

<controller_ip>: The IP address or hostname of the Avi Controller.
pool: Specifies that metrics for pool entities are being requested.
entity_id: A comma-separated list of pool UUIDs for which metrics are desired.

3.3. `entity_id` Limitation

To effectively circumvent the API timeout, the maximum number of entity_id values included in a single API call must be limited to 100. This limit is crucial for ensuring that each individual request completes within the existing 30-second timeout window.

3.4. Constructing Multiple API Calls

To implement this resolution, the following steps should be followed:

Obtain All Pool UUIDs: Gather the complete list of UUIDs for all pools from which metrics are required.
Batching: Divide the comprehensive list of pool UUIDs into batches, with each batch containing a maximum of 100 UUIDs.
Generate API Calls: For each batch, construct a unique API URL using the format described in Section 3.2, populating the entity_id parameter with the comma-separated UUIDs from that specific batch.
Execute Calls: Execute each generated API call sequentially or in parallel (with appropriate rate limiting to avoid overwhelming the controller) to retrieve the metrics data.

Additional Information

https://techdocs.broadcom.com/us/en/vmware-security-load-balancing/avi-load-balancer/avi-load-balancer/31-1/monitoring-and-operability-guide/nsx-advanced-load-balancer-monitoring-components/avi-vantage-prometheus-integration.html