Fluent Bit pods are in “CrashLoopBackOff” in TKGi v1.19.1
search cancel

Fluent Bit pods are in “CrashLoopBackOff” in TKGi v1.19.1

book

Article ID: 373835

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

This issue is observed in freshly installed TKGi v1.19.1 clusters as well as in upgraded clusters from TKGi v1.18 to v1.19.1.

Fluent-bit pods are flapping between “Running” and  “CrashLoopBackOff” status.

# kubectl get pods -n pks-system | grep fluent-bit
fluent-bit-r2pdx     1/2     CrashLoopBackOff   3979 (3m3s ago)    14d
fluent-bit-ptf9w     2/2     Running            3785 (4m37s ago)   14d
fluent-bit-ptzr8     2/2     Running            3768 (7m22s ago)   14d

Environment

Tanzu Kubernetes Grid Integrated Edition v1.19.1

Cause

Go version of the syslog-output plugin for fluent-bit in TKGi v1.19.1 is golang v1.22.2.

Please refer to this golang issue Github Issues golang/go #62440.

Resolution

Revert the fluent-bit version from v2.2.3 to 1.9.3.

1. Create a patch script

  • Update FLUENTBIT_IMAGE value in the script for your environment
  • The script name is patch_fluentbit.sh
#!/bin/bash

FLUENTBIT_IMAGE="gcr.io/cf-pks-golf/fluent-bit-out-syslog:1.18.4" # !!! UPDATE !!!

context=$1
kubectl config use-context "$context"
echo "Patching cluster in context: $context"
kubectl patch daemonset fluent-bit -n pks-system --type='json' -p="[
  {'op': 'replace', 'path': '/spec/template/spec/containers/0/image', 'value': '${FLUENTBIT_IMAGE}'},
  {'op': 'replace', 'path': '/spec/template/spec/containers/0/imagePullPolicy', 'value': 'IfNotPresent'},
  {'op': 'replace', 'path': '/spec/template/spec/initContainers/0/image', 'value': '${FLUENTBIT_IMAGE}'},
  {'op': 'replace', 'path': '/spec/template/spec/initContainers/0/imagePullPolicy', 'value': 'IfNotPresent'}
]"

 

2. Use the below command to execute the script 

chmod +x ./patch_fluentbit.sh 

 

3. Patch the cluster with the below command

# Check the current target k8s cluster
kubectl config current-context
# > test-1

# Patch the daemonset of fluent-bit
./patch_fluentbit.sh $(kubectl config current-context)
#> Switched to context "test-1".
#> Patching cluster in context: test-1
#> daemonset.apps/fluent-bit patched

# Check the fluent-bit pods status
kubectl -n pks-system get pods | grep fluent-bit
#> fluent-bit-th4hd               2/2     Running   0          12m

 

 

Additional Information

  • This issue is only observed in TKGi v1.19.1  not  in TKGi v1.19.0
  • This issue will be fixed in TKGi v1.19.2