Fluent-bit sqldb corruption issue in Tanzu Kubernetes Grid Integrated

search cancel

Fluent-bit sqldb corruption issue in Tanzu Kubernetes Grid Integrated

book

Article ID: 376833

calendar_today

Updated On: 09-09-2024

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi) VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition (Core) VMware Tanzu Kubernetes Grid Integrated Edition 1.x VMware Tanzu Kubernetes Grid Integrated EditionStarter Pack (Core)

Issue/Introduction

Fluent-bit pod crashloops with the below error:

Defaulted container "fluent-bit" out of: fluent-bit, ghostunnel, concat-keystore (init)
Fluent Bit v1.9.3
Copyright (C) 2015-2022 The Fluent Bit Authors
Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
https://fluentbit.io
[2024/09/03 01:26:56] [error] [sqldb] error=disk I/O error
[2024/09/03 01:26:56] [error] [input:tail:tail.0] db: could not create 'in_tail_files' table
[2024/09/03 01:26:56] [error] [input:tail:tail.0] could not open/create database
[2024/09/03 01:26:56] [error] Failed initialize input tail.0
[2024/09/03 01:26:56] [error] [lib] backend failed

Cause

This happens if there is a storage corruption on the node and the sqldb file gets corrupted.

Resolution

You need to bosh ssh to the node where the problematic fluent-bit is running and here are the steps to do the same.

kubectl get po <fluent-bit pod name> -o wide

From the above command once you get the worker node name let's get the worker IP

kubectl get no <node-name> -o wide

Now from a terminal where we can run bosh commands

bosh -d service-instance-uuid vms | grep <IP address of the worker node>
bosh -d service-instance-uuid ssh worker/id --> obtained from the above command

Move the file /var/log/flb_kube.db to /tmp or somewhere and then restart the fluent-bit pod running on this node which should recreate this DB file and then the pod will start running.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No