Let start with MetricSink
Here is sample of metrics sink set in default namespace and sending metrics into splunk:
apiVersion: pksapi.io/v1beta1
kind: MetricSink
metadata:
name: MyMetricSink
namespace: default
spec:
inputs:
outputs:
- data_format: splunkmetric
headers:
Authorization: Splunk c797b318-...-78f2f4a3fb94
Content-Type: application/json
insecure_skip_verify: true
method: POST
splunkmetric_hec_routing: true
type: http
url: https://SPLUNKFQDN:8088/services/collector
Once this is applied a new telegraf-MyMetricSink deployment is started in the specified namespace, together with configmap containing defined inputs and outputs from the MetricSink. At this point the metrics is ready to collect pod app details, in order to instruct telegraf to collect the needed metrics we have to apply annotations to our pods/deployment to specify location and port from where the metrics to be scraped:
The example contains annotations defined for nginx deployment which exposes app metrics on port 9913 :
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: default
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx-server
template:
metadata:
annotations:
prometheus.io/path: "/metrics"
prometheus.io/scrape: "true"
prometheus.io/port: "9913"
labels:
app: nginx-server
spec:
containers:
- name: nginx-demo
image:nginx-vts-exporter
imagePullPolicy: Always
resources:
limits:
cpu: 250m
requests:
cpu: 20m
ports:
- containerPort: 80
name: http
- containerPort: 9913
name: metrics
The defined annotation will be found by telegraf and it will automatically start collecting metrics from the specified address:port.
in order to troubleshoot MetricsSink
1. follow the logs:
kubectl logs -n default telefraf-ID -f
and monitor for any issues like failed to connect or unauthorized messages
2. If you do not see errors in the logs further can confirm if telegraf is sending messages by logging into the worker running the pod and verifying with packet capture
ssh to worker, then monitor for SPLUNKFQDN and / or port 8088
tcpdump -n port 8088 and host SPLUNKFQDN(or IP)
Next is ClusterMetricSink, the definition would be similar please note there is no namespace defined.
apiVersion: pksapi.io/v1beta1
kind: ClusterMetricSink
metadata:
name: My-ClusterMetricsink
spec:
inputs:
outputs:
- data_format: splunkmetric
headers:
Authorization: Splunk c797b318-63f4-4dda-a928-78f2f4a3fb94
Content-Type: application/json
insecure_skip_verify: true
method: POST
splunkmetric_hec_routing: true
type: http
url: https://SPLUNKFQDN:8088/services/collector
Once this is applied, a configmap for telegraf is updated in pks-system namespace with provided inputs and outputs, in our case as we use default inputs telgraf will be configured to retrieve metrics from kubernetes.
ClusterMetricSink is using preprovisioned daemonset in namespace pks-system to scrape kubernetes statistics, these stats contains wide range from single pod to cluster utilization.
For troubleshooting purposes logs can be also checked on the telegraf pods as well as tcpdump can be analyzed to confirm the metrics is sent across.