In TKGi environments where Log Sink resources are enabled in the TKGi tile > In-Cluster Monitoring, fluentbit pods in the pks-system namespace may show numerous log entries as follows:
fluent-bit [2024/08/06 11:45:56] [error] [input:emitter:emitter_for_multiline.0] error registering chunk with tag: kube.var.log.containers.fluent-bit-<>_pks-system_fluent-bit-<>.log
You may also notice intermittent CPU usage spikes and/or OOMKilled conditions in the pods.
Up to TKGi v1.20.0 (fluent-bit v2.x).
This can be caused by a spike in logging by the applications monitored by fluentbit through LogSink/ClusterLogSink resources.
Most likely root cause is a saturation of the Memory Buffers at two different levels: fluentbit container level and in_emitter plugin level.
There's an upstream fluentbit known issue covering this case: https://github.com/fluent/fluent-bit/issues/8198
Upgraded to TKGi v1.21.0
The upstream fluentbit fix is included in v3.0.2: https://github.com/fluent/fluent-bit/pull/8473
This issue is resolved since fluent-bit version will be upgraded to v3.2.2 starting from TKGi v1.21.0.
Up to TKGi v1.20.0
As a workaround, it's suggested to increase the following Memory Limits:
emitter_mem_buf_limit
in the fluent-bit ConfigMap in the pks-system namespace, under filter-multiline.conf
(emitter_mem_buf_limit Docs) filter-multiline.conf: |
[FILTER]
Name multiline
Match *
multiline.key_content log
multiline.parser go, java, python
emitter_mem_buf_limit 50MB
"kubectl rollout restart ds fluent-bit -n pks-system"
The values to which these limits need to be increased would depend on many factors, such as logging pressure from applications.
They are experienced ones, meaning its fine tuning can only be done through trial and error. If the initial values are not enough, please increase them until you see the problem resolved.