Telegraf pods goes into crashloopbackoff after upgrade of the cluster
search cancel

Telegraf pods goes into crashloopbackoff after upgrade of the cluster

book

Article ID: 403671

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

After upgrade of 4 clusters to TKGi 1.19 one cluster had problem with telegraf pods running in pks-system namespace

Error messages visible on the pods logs as per below:

kubectl logs telegraf-ID p-n pks-system

2025-07-07T07:20:11Z I! Loading config: /etc/telegraf/telegraf.conf
2025-07-07T07:20:11Z I! Starting Telegraf 1.29.5 brought to you by InfluxData the makers of InfluxDB
2025-07-07T07:20:11Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 6 secret-stores
2025-07-07T07:20:11Z I! Loaded inputs: kubernetes net_response (25x)
2025-07-07T07:20:11Z I! Loaded aggregators:
2025-07-07T07:20:11Z I! Loaded processors:
2025-07-07T07:20:11Z I! Loaded secretstores:
2025-07-07T07:20:11Z I! Loaded outputs: wavefront
2025-07-07T07:20:11Z I! Tags enabled: cluster_name=NAME host=5a35cdb7-xxxx-xxxx-xxxx-e9e2338385bf
2025-07-07T07:20:11Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"5a35cdb7-xxxx-xxxx-xxxx-e9e2338385bf", Flush Interval:20s
2025-07-07T07:20:11Z E! [telegraf] Error running agent: could not initialize input inputs.net_response: address EXAMPLE.COM: missing port in address

Cluster have configured clustermetricsink defined

Environment

TKGi 1.19 

TKGi 1.20

Cause

The new version of telegraf require defined port for inputs endpoints

Modifying the clustermetricsink generates updated version of the telegraf configmap and reloads the the daemonset.

Having inputs.net_response: address EXAMPLE.COM  without ports was allowed in the previous version of telegraf, however with the updated version  address cannot be used without port defined.

Resolution

To bring telegraf pods to a running state the clustermetricsink have to be updated:

Option 1: Fix the inputs.net_response: address EXAMPLE.COM and update it to inputs.net_response: address EXAMPLE.COM:<PORT>

Correct any address that might have No PORT defined 

Option 2: Take a backup of the metric sink: kubectl get clustermetricsink default -oyaml >  clustermetricsink_default.yaml, Delete the addresses that does not have a port, in case of need restore the clustermetricsink with updated ports follwing Option 1