LogSinks troubleshooting for TKGi
search cancel

LogSinks troubleshooting for TKGi

book

Article ID: 298605

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

In order to confirm if LogSink is working as expected and all settings are in place, the provided resolution is a way to inspect all components involved in creating and running LogSink and ClusterLogSink on a Tanzu Kubernetes Grid Integrated Edition (TKGi) environment.

Starting from LogSink creation, configmap, propagated to the fluent-bit pod, followed by log monitoring and syslog output.

Environment

Product Version: 1.19+

Resolution

First we have to observe how the LogSink works:

When initially a LogSink is created, there are several components that are associated.
LogSink YAML -> fluent-bit config in the pks-system namespace -> fluent-bit deamonset workload.

Once the new LogSink is created and applied, this reflects in the Fluent Bit config file and fluent-bit is restarted to take the new settings.

Example of LogSink:
apiVersion: pksapi.io/v1beta1
kind: LogSink
metadata:
  name: logsinktest
  namespace: dev
spec:
  enable_tls: true
  host: SYSLOGSERVERIP
  port: 514
  type: syslog

This should be also reflected in the following:

kubectl get configmaps -n pks-system -oyaml fluent-bit
In the YAML config you should see the following section:
    [OUTPUT]
        Name syslog
        Match *
        InstanceName logsinktest
        Addr SYSLOGSERVERIP:514
        Namespace default
The same config details can be found in the fluent-bit pod under: /fluent-bit/etc/fluent-bit.conf (cat /fluent-bit/etc/outputs.conf)
Next, allocate the pod that is generating logs and confirm the logs are flowing:
kubectl logs logger -n default -f

In a new window, go to the fluent-bit pod running on the same worker as the log generating pod:

kubectl get logger -owide -n default 
Confirm the worker ID
kubectl get pod -n pks-system -owide 
Get the ID of the fluent-bit pod running on the same worker as logger
kubectl exec -n pks-system fluent-bit-ID -it bash
Run tail -f on the logging pod under the following location:
/var/logs/pods/default_logger_76fe1c4c-0d29-4dc8-a87d-906c6bafc67f/logger/0.log

Where:
  • default is the namespace
  • logger is the pod name
  • 76fe1c4c-0d29-4dc8-a87d-906c6bafc67f is the ID
You should be able to see and compare the logs from kubectl logs and tail and confirm that messages are persistent.

At this stage we have confirmed the config is in place and the logs are visible from the fluent-bit pod.

Next, we need to confirm if the logs are sent to syslog server. We have to identify the worker where the logger and Fluent Bit are running, then ssh into it.

 Run the following:
tcpdump -n port 514 and host  SYSLOGSERVERIP (optionally add -w /tmp/worker.pcap for more detailed analysis)

Explore the pcap file with wireshark and filter for content:

frame contains "default"
Then confirm that the logs visible from Fluent Bit are also sent to syslog:
tcpdump -n port 514 and host 10.213.48.132
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:56:08.000577 IP 10.213.48.220.34378 > 10.213.48.132.514: Flags [P.], seq 3119020594:3119020832, ack 1686853298, win 507, options [nop,nop,TS val 3915994610 ecr 3297830605], length 238
20:56:08.000841 IP 10.213.48.132.514 > 10.213.48.220.34378: Flags [.], ack 238, win 2498, options [nop,nop,TS val 3297832605 ecr 3915994610], length 0
20:56:09.000555 IP 10.213.48.220.34378 > 10.213.48.132.514: Flags [P.], seq 238:606, ack 1, win 507, options [nop,nop,TS val 3915995610 ecr 3297832605], length 368
20:56:09.000612 IP 10.213.48.220.34378 > 10.213.48.132.514: Flags [P.], seq 606:845, ack 1, win 507, options [nop,nop,TS val 3915995610 ecr 3297832605], length 239
20:56:09.000759 IP 10.213.48.132.514 > 10.213.48.220.34378: Flags [.], ack 606, win 2498, options [nop,nop,TS val 3297833604 ecr 3915995610], length 0
20:56:09.000777 IP 10.213.48.132.514 > 10.213.48.220.34378: Flags [.], ack 845, win 2498, options [nop,nop,TS val 3297833604 ecr 3915995610], length 0
20:56:09.000891 IP 10.213.48.220.34402 > 10.213.48.132.514: Flags [P.], seq 3122056594:3122056967, ack 1593739083, win 507, options [nop,nop,TS val 3915995610 ecr 3297829604], length 373
20:56:09.001073 IP 10.213.48.132.514 > 10.213.48.220.34402: Flags [.], ack 373, win 2416, options [nop,nop,TS val 3297833605 ecr 3915995610], length 0
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
To further confirm syslog is receiving the messages, tcpdump can be executed on the syslog server side.