In TKGi environments where Log Sink resources are enabled in the TKGi tile > In-Cluster Monitoring, fluentbit pods in the pks-system namespace may show numerous log entries as follows:
fluent-bit [2024/08/06 11:45:56] [error] [input:emitter:emitter_for_multiline.0] error registering chunk with tag: kube.var.log.containers.fluent-bit-<>_pks-system_fluent-bit-<>.log
You may also notice intermittent CPU usage spikes and/or OOMKilled conditions in the pods.
This can be caused by a spike in logging by the applications monitored by fluentbit through LogSink/ClusterLogSink resources.
Most likely root cause is a saturation of the Memory Buffers at two different levels: fluentbit container level and in_emitter plugin level.
There's an upstream fluentbit known issue covering this case: https://github.com/fluent/fluent-bit/issues/8198
The upstream fluentbit fix is included in v3.0.2: https://github.com/fluent/fluent-bit/pull/8473
TKGi will include an updated fluentbit version including the fix in future releases (no ETA yet).
As a workaround it's suggested to increase the following Memory Limits:
emitter_mem_buf_limit
in the fluent-bit ConfigMap in the pks-system namespace, under filter-multiline.conf
(emitter_mem_buf_limit Docs) filter-multiline.conf: |
[FILTER]
Name multiline
Match *
multiline.key_content log
multiline.parser go, java, python
emitter_mem_buf_limit 50MB
"kubectl rollout restart ds fluent-bit -n pks-system"
The values to which these limits need to be increased would depend on many factors, such as logging pressure from applications.
They are experienced ones, meaning its fine tuning can only be done through trial and error. If the initial values are not enough, please increase them until you see the problem resolved.