This issue is observed in freshly installed TKGi v1.19.1 clusters as well as in upgraded clusters from TKGi v1.18 to v1.19.1.
Fluent-bit pods are flapping between “Running” and “CrashLoopBackOff” status.
# kubectl get pods -n pks-system | grep fluent-bit
fluent-bit-r2pdx 1/2 CrashLoopBackOff 3979 (3m3s ago) 14d
fluent-bit-ptf9w 2/2 Running 3785 (4m37s ago) 14d
fluent-bit-ptzr8 2/2 Running 3768 (7m22s ago) 14d
Tanzu Kubernetes Grid Integrated Edition v1.19.1
Go version of the syslog-output plugin for fluent-bit in TKGi v1.19.1 is golang v1.22.2.
Please refer to this golang issue Github Issues golang/go #62440.
Revert the fluent-bit version from v2.2.3 to 1.9.3.
1. Create a patch script
#!/bin/bash
FLUENTBIT_IMAGE="gcr.io/cf-pks-golf/fluent-bit-out-syslog:1.18.4" # !!! UPDATE !!!
context=$1
kubectl config use-context "$context"
echo "Patching cluster in context: $context"
kubectl patch daemonset fluent-bit -n pks-system --type='json' -p="[
{'op': 'replace', 'path': '/spec/template/spec/containers/0/image', 'value': '${FLUENTBIT_IMAGE}'},
{'op': 'replace', 'path': '/spec/template/spec/containers/0/imagePullPolicy', 'value': 'IfNotPresent'},
{'op': 'replace', 'path': '/spec/template/spec/initContainers/0/image', 'value': '${FLUENTBIT_IMAGE}'},
{'op': 'replace', 'path': '/spec/template/spec/initContainers/0/imagePullPolicy', 'value': 'IfNotPresent'}
]"
2. Use the below command to execute the script
chmod +x ./patch_fluentbit.sh
3. Patch the cluster with the below command
# Check the current target k8s cluster
kubectl config current-context
# > test-1
# Patch the daemonset of fluent-bit
./patch_fluentbit.sh $(kubectl config current-context)
#> Switched to context "test-1".
#> Patching cluster in context: test-1
#> daemonset.apps/fluent-bit patched
# Check the fluent-bit pods status
kubectl -n pks-system get pods | grep fluent-bit
#> fluent-bit-th4hd 2/2 Running 0 12m