TKGI: Tail input plugin in Fluent-bit is reporting "No space left on device"
search cancel

TKGI: Tail input plugin in Fluent-bit is reporting "No space left on device"

book

Article ID: 319334

calendar_today

Updated On: 11-26-2024

Products

VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition (Core) VMware Tanzu Kubernetes Grid Integrated Edition 1.x VMware Tanzu Kubernetes Grid Integrated EditionStarter Pack (Core) VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

Symptoms:
  • You see errors similar to the following in the input plugin in Fluent-bit:

    No space left on device

  • When you check the Fluent-bit pods by running kubectl logs <fluent-bit-pod> -n pks-system, you see the entries similar to:

    [2020/03/04 20:16:17] [error] [in_tail] could not register file into fs_events
    [2020/03/04 20:16:17] [error] [plugins/in_tail/tail_fs.c:219 errno=28] No space left on device
    [2020/03/04 20:16:17] [error] [in_tail] could not register file into fs_events
    [2020/03/04 20:16:17] [error] [plugins/in_tail/tail_fs.c:219 errno=28] No space left on device
    [2020/03/04 20:16:17] [error] [in_tail] could not register file into fs_events
  • You see that there is enough free space on /var/log inside the pod and the worker nodes also have enough free space.

  • Per the Fluent-bit Github issue, an Open Source Fluent-bit code change was created to provide the following detailed error as well:
       ENOSPC  The  user  limit on the total number of inotify watches was reached
                        or the kernel failed to allocate a needed resource.

 

Impact:

If this situation occurs, the underlying log files are actually not lost or deleted.  

They are still there.  However, they will no longer be monitored by Fluent-bit after hitting that current limit. 

This situation and error occur because (at that time) the system kernel has reached the limit of filesystem "inodes" (not a limit of storage space).

Environment

VMware TKGI

Cause

This is expected behavior with Fluent-bit if there are not enough file descriptors and the kernel parameter fs.inotify.max_user_watches is not currently sufficient for the capacity of the cluster workloads. 

These resources have to be managed by the cluster Administrator, and increased appropriately, and may be dependent upon existing cluster resources (number of nodes, etc) and the dynamic nature of workloads running within the cluster.

Resolution

Workaround:

As a work around, you can increase the sysctl parameter fs.inotify.max_user_watches to 16384 to start with and see if this resolves the issue.

You have two(2) options for modifying the systctl parameter:

 

  • Method 1: Manually increase the sysctl parameter fs.inotify.max_user_watches on ALL current worker node VMs: 

IMPORTANT: This workaround will not persist across TKGI upgrades or node recreation.

     For more information, see https://github.com/fluent/fluent-bit/issues/1018

    • Check the current kernel parameter value:

      sysctl -a | grep fs.inotify.max_user_watches
    • Increase the kernel parameter value to 16384 from the command line

      sysctl -w  fs.inotify.max_user_watches=16384
    • Update the new value to the kernel from the command line:

      sysctl -p
    • Check the updated value:

      sysctl -a | grep fs.inotify.max_user_watches
    • ALTERNATIVELY: You can perform the same update by editing the /etc/sysctl file:
      • Edit the /etc/sysctl as the root user

      • Locate the fs.inotify.max_user_watches parameter

      • Overwrite the existing value to 16384

      • Instruct the kernel to read the /etc/sysctl file:  

        sysctl -p

 

  • Method 2: Deploy a Bosh Add-on to implement the parameter changes. This workaround will persist upgrades/updates and be applied to all clusters.

    • For a detailed steps on how to implement a Bosh Add-on to include custom kernel parameter settings and how to apply it to TKGI clusters, please see:

https://github.com/svrc/tkgi-kernel-params

Additional Information

For more information on Bosh Addons. Refer to:
 

 

For more information on the Fluent-bit Open Source issue.  Refer to: