Prometheus Scrape Failed with "body size limit exceeded" for /table

Products

VMware Tanzu Data VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Intelligence VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

Prometheus Target Status shows scrape as failed. The error message displayed is "body size limit exceeded".

Environment

Greenplum Command Center (GPCC) Metric Exporter, Prometheus Integration

Cause

The table_metrics endpoint provides high-resolution visibility into every table within the Greenplum database. To maintain consistency with GPCC, the exporter generates 25 distinct metric types per table (refer to official GPCC documentation for the list of metrics). Because of the volume of this data, it is served via a dedicated API endpoint (/table_metrics) with a recommended 5-minute scrape interval.

The fundamental issue lies in the transformation of data from a database format to a monitoring format:

Format Transformation: In a database, table metadata is stored in a compact binary format. When scraped by Prometheus, this same data is converted into a verbose, human-readable text format.
Mathematical Projection: If an environment contains 10,000 tables, the exporter must generate at least 250,000 unique lines of text per scrape.
Cardinality Impact: In larger environments (e.g., 200k+ rows in gpmetrics.gpcc_table_info), the generated payload can reach upwards of 400 MB.

Result: This high cardinality causes the resulting payload size to exceed the default safety limits configured in Prometheus, triggering the "body size limit exceeded" error and causing the scrape to fail.

Resolution

Temporary Workarounds

A. Adjust Prometheus Configuration

To prevent Prometheus from dropping the metrics due to size constraints, you can manually increase the ingestion limit for the specific job.

You can edit your prometheus.yml configuration file, add or update the body_size_limit parameter. We recommend a value of 512MB (or higher depending on your count) to provide adequate headroom.

scrape_configs:
  - job_name: 'greenplum_sandpit_table_inventory'
    metrics_path: '/table_metrics'
    scrape_interval: 5m
    scrape_timeout: 1m
    body_size_limit: 512MB  # Increased to handle high cardinality
    static_configs:
      - targets: ['<hostname>:6162']

Note: if grafana instance encountered OOM after this tuning, refer to kb 433733 for workaround and resolution.

B. Enable Gzip Compression (Optional from Infrastructure Level)

If you are using an Nginx reverse proxy, enabling Gzip compression is the most effective way to reduce network overhead. Since Prometheus metrics are text/plain, they are highly compressible. gzip can shrink a 400 MB payload down to approximately 30–40 MB for transmission.

This is optional but somehow essential, though it does not directly reduce the "body size", a gzip compression can reduce the possibility of scrape timeout as it result in smaller payload transmission.

Nginx Example Configuration:

location /prometheus {
    proxy_pass http://127.0.0.1:9095;      
    proxy_set_header Host $host;      
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;      
    
    gzip on;      
    gzip_types text/plain; # Prometheus metrics are text/plain
    gzip_proxied any;      
    gzip_min_length 1000;
}

location /table_metrics {
    gzip on;      
    gzip_types text/plain; # Prometheus metrics are text/plain      
    gzip_proxied any;      
    gzip_min_length 1000;   
}

C. Verify with CURL

To verify the payload size and HTTP response code without loading it into your terminal, use the following command:

curl -sS -H "Authorization: Bearer xxxxx" \
--write-out '{"http_code":"%{http_code}","size_download_bytes":%{size_download}}\n' \
--output /dev/null \
http://<hostname>:6162/table_metrics