Pods evicted due to node low on ephemeral-storage error

search cancel

Pods evicted due to node low on ephemeral-storage error

book

Article ID: 394739

calendar_today

Updated On:

Products

VCF Automation VMware Cloud Foundation

Issue/Introduction

Some pods could crash and get evicted if a node becomes low on disk space.

Environment

VMware Cloud Foundation 9.0
VMware Identity Broker 9.0
VCF Automation 9.0

Cause

If one or more pods running on the node produces a large volume of logs and the node's disk usage is high at the same time, it is possible to run out of ephemeral storage before periodic cleanup happens, leading to pod eviction.

Resolution

If this occurs, increase the disk used by the object store for storing support bundle data using Fleet Manager APIs.

Method: POST

URL: https://<FleetManager hostname>/lcm/lcops/api/environments/<environment ID>/products/<product ID>/actions/invoke

Payload:

{
  "name": "configure-packages",
  "properties": {
     "namespace": "vmsp-platform",
     "name": "vmsp-platform",
     "values": {"profiles":{"overrides":{"supportBundle":{"logOffloader":{"s3BucketRequests":{"size": "<value>"}}}}}}
  },
  "ref": "/webhooks/core/vmsp/configure"
}

where <value> depends on the profile used (these are minimum recommended values, you can use larger ones):

Profile	Size
small	83Gi
medium	108Gi
large	183Gi

Additional Information

Usually pod eviction leads to a temporary degradation of services with a subsequent recovery. However, if the disk size is not increased proactively or when this problem is observed, generated support bundles might miss the log data for the duration of degradation and recovery.

Feedback

thumb_up Yes

thumb_down No