fluentbit pods "error registering chunk" log errors

search cancel

fluentbit pods "error registering chunk" log errors

book

Article ID: 375792

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

In TKGi environments where Log Sink resources are enabled in the TKGi tile > In-Cluster Monitoring, fluentbit pods in the pks-system namespace may show numerous log entries as follows:

fluent-bit [2024/08/06 11:45:56] [error] [input:emitter:emitter_for_multiline.0] error registering chunk with tag: kube.var.log.containers.fluent-bit-<>_pks-system_fluent-bit-<>.log

You may also notice intermittent CPU usage spikes and/or OOMKilled conditions in the pods.

Environment

Up to TKGi v1.20.0 (fluent-bit v2.x).

Cause

This can be caused by a spike in logging by the applications monitored by fluentbit through LogSink/ClusterLogSink resources.
Most likely root cause is a saturation of the Memory Buffers at two different levels: fluentbit container level and in_emitter plugin level.

There's an upstream fluentbit known issue covering this case: https://github.com/fluent/fluent-bit/issues/8198

Resolution

Upgraded to TKGi v1.21.0

The upstream fluentbit fix is included in v3.0.2: https://github.com/fluent/fluent-bit/pull/8473
This issue is resolved since fluent-bit version will be upgraded to v3.2.2 starting from TKGi v1.21.0.

Up to TKGi v1.20.0

As a workaround, it's suggested to increase the following Memory Limits:

fluentbit container Memory Limit from TKGi tile > In-Cluster Monitoring (Increase fluentbit container Memory Limit on TKGi tile)
Then Apply Changes from OpsMan and upgrade the TKGi clusters.
emitter_mem_buf_limit in the fluent-bit ConfigMap in the pks-system namespace, under filter-multiline.conf (emitter_mem_buf_limit Docs)

E.g.

filter-multiline.conf: |
[FILTER]
Name multiline
Match *
multiline.key_content log
multiline.parser go, java, python
emitter_mem_buf_limit 50MB

After editing the ConfigMap, restart the fluentbit pods with command: "kubectl rollout restart ds fluent-bit -n pks-system"

Please note that the changes in the ConfigMap are not persistent across TKGi upgrades.

The values to which these limits need to be increased would depend on many factors, such as logging pressure from applications.
They are experienced ones, meaning its fine tuning can only be done through trial and error. If the initial values are not enough, please increase them until you see the problem resolved.

Feedback

thumb_up Yes

thumb_down No