Root filesystem full on Worker nodes from Portworx or pxctl volume logs causes Pod Eviction in PKS

search cancel

Root filesystem full on Worker nodes from Portworx or pxctl volume logs causes Pod Eviction in PKS

book

Article ID: 298581

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:
The pods are showing up as Evicted under STATUS.

For example:

$ kubectl get pods -o wide 
NAMESPACE            NAME                                            READY   STATUS      RESTARTS   AGE     IP              NODE                                   NOMINATED NODE
your-namespace         your-pod-name-6c785b8db7-hmmtk          0/1     Evicted     0          30d     <<your-pod-ip>>     <<your-k8s-node-id>>   <none>

When hopping onto that node through SSH, etc. you see that the root(/) is full or close to being full:

worker/<<your-worker-id>>:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       2.9G  1.6G  1.2G  58% /
worker/0871133b-5ade-487b-8d1a-17919b683e79:~$

Environment

Cause

Portworx was writing its pxd files to the root (/) filesystem on Worker nodes, specifically to /var/lib/osd/logs. Since the root filesystem (/dev/sda1) is sized only 3 GB for O/S and Kernel files, it fills up very quickly when being written to.

Resolution

On all affected nodes you can do the following to get them into a schedulable state quickly:

1. Temporarily move the pxctl / portworx related logs off of /var/lib/osd/(logs) to a larger FS: such as /var/vcap/datastore/.

2. Then perform the following commands:

$ sudo -i
# monit restart all

3. Then run the following command:

$ kubectl delete pod <<evicted pod name>>

4. Then make sure to adjust your Portworx or pxctl related to configuration to avoid using the same root(/) filesystem for objects, such as logs.

Feedback

thumb_up Yes

thumb_down No