We have implemented Open Telemetry metric ingestion across various applications and we have noticed that metrics are represented inconsistently within the metric view.
Otel metrics coming from span connector normally would not send data points when no corresponding traffic is detected during the 15 second reporting period. This is normal behavior that saves network, cpu and memory overhead both for otel-collector and backend services.
In case of low traffic (less than 4 transactions/minute) this can result in "broken" graph in DXO2 metric viewer. This is normal as par the OOTB settings.
Note that the graph which is not connecting is a sign that the interval is not known, and the data points are only updated/sent when they are available.
In case if it is absolutely needed to see 0 count metrics every 15 seconds a following otel-collector configuration can be used.
The following configuration:
1) Disable delta temporality in span connector settings.
2) Add cumulative to delta processor to metrics pipeline.
Below is a snipet from sample config.yaml highlighting key changes in bold: Just pick the bold changes...
...
processors:
batch:
send_batch_max_size: 500
send_batch_size: 100
timeout: 10s
cumulativetodelta:
# Convert all cumulative metrics to delta
connectors:
spanmetrics:
dimensions:
- name: db.system
- name: http.method
- name: http.request.method
- name: messaging.system
- name: rpc.system
- name: peer.service
metrics_flush_interval: 15s
#Remove: aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
service:
pipelines:
traces:
receivers:
- otlp
processors:
- batch
exporters:
- otlphttp/dxo2
- spanmetrics
metrics:
receivers:
- otlp
- spanmetrics
processors:
- batch
- cumulativetodelta
exporters:
- otlphttp/dxo2
- debug