LogSink stops sending logs for specific pods after upgrading to latest Tanzu Kubernetes Grid Integrated Edition version
search cancel

LogSink stops sending logs for specific pods after upgrading to latest Tanzu Kubernetes Grid Integrated Edition version

book

Article ID: 298688

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

After upgrading to the latest version of Tanzu Kubernetes Grid Integrated Edition (TKGI), LogSink stops sending logs for specific pods.

When reviewing your environment, you see the settings on the pods are correct. The LogSink is configured to collect logs for the correct namespace, however, the logs for particular pods are not forwarded.

To confirm the exact output of the fluent-bit daemon, perform the following steps have to stop log forwarding:

1. Make a backup of the existing config map:
kubectl get cm -n pks-system fluent-bit -o yaml > fluent-bit-cm.yaml

2. Edit config map of the fluent-bit:
kubectl edit cm -n pks-system fluent-bit

Under section outputs.conf, remove all outputs defined and leave only the following output:
  outputs.conf: |2

    [OUTPUT]
        Name file
        Match *logg*
        File fluentbit_output.log
        Path /tmp

Where the *logg* should correspond to the name of your pod.

3. Perform rollout  restart on fluent-bit:
kubectl rollout restart daemonset -n pks-system fluent-bit

4. Confirm the pod in question and the corresponding fluent-bit worker nodes with this command:
kubectl get pod -A -owide

In the case there are too many pods, you might have to specify only namespaces instead of using -A.
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE                                   NOMINATED NODE   READINESS GATES
test          logger-this-is-a                         1/1     Running   0          21h   172.37.6.2     3f818472-7599-4630-96e1-5027f9d0bfd1   <none>           <none>
pks-system    fluent-bit-5dlcs                         2/2     Running   0          77m   172.37.5.8     3f818472-7599-4630-96e1-5027f9d0bfd1   <none>           <none>
pks-system    fluent-bit-m65w8                         2/2     Running   0          77m   172.37.5.7     937f645a-a890-4d87-9469-7dd558af6e0d   <none>           <none>
pks-system    fluent-bit-ndxkz                         2/2     Running   0          77m   172.37.5.9     d98ec4c2-8216-4ba7-a654-fddcb88785c5   <none>           <none>

----------------------------------------------------------------------------------------

kubectl exec -it fluent-bit-5dlcs -n pks-system -- bash
root@fluent-bit-5dlcs:/# tail -f tmp/fluentbit_output.log

5. Verify if metadata is visible in the output, the metadata looks similar to this:
"kubernetes":{"pod_name":"logger-this-is-a","namespace_name":"test","pod_id

Where logger-this-is-a and test are names of the pod and the namespace. 

If this data is missing, you must reduce the size of metadata for fluent-bit to behave correctly.

Environment

Product Version: 1.11

Resolution

In the Kubernetes Filter, if the HTTP buffer is not big enough, the attempt to retrieve metadata will silently fail.

Metadata can be retrieved by getting the metadata directly from the pod with this command:
kubectl get pod POD -o jsonpath='{.metadata}' 

We recommend setting the Buffer_Size to 0 for the kubernetes filter. This was found in pods with many environment variables exceeding the buffer size limit.

Update the ConfigMap by adding  Buffer_Size 0. Annotations off this will reduce the size of the metadata drastically and allows the fluent-bit buffer to grow when needed.
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Buffer_Size         0
        Annotations         Off
        Merge_Log           On
        K8S-Logging.Parser  On

Note: Upgrade your cluster after updating the ConfigMap. If you do not do this, the settings are reverted to original values.