Node "NotReady" Status Due to ext4 Filesystem Read/Write Failure on PVC
search cancel

Node "NotReady" Status Due to ext4 Filesystem Read/Write Failure on PVC

book

Article ID: 433264

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

Nodes within the TKGm clusters unexpectedly crash, resulting in a NotReady status when viewed from the control plane.

When in this state, the affected nodes exhibit the following behaviors:

  • Complete loss of network accessibility (SSH attempts fail).

  • Unresponsive to standard interactions via the vSphere cluster interface.

Environment

3.x

Cause

The node failure is caused by an underlying storage issue where the operating system encounters corruption or a read/write failure on an ext4 formatted filesystem attached via a Persistent Volume Claim (PVC).

Specifically, a Java application process (comm java) attempts to perform read/write operations to its mapped disk. Because the underlying block device (e.g., /dev/sdd) loses its ability to read the filesystem, the storage becomes inaccessible, causing the node kernel to lock up or crash.

Resolution

Because the node becomes completely inaccessible via SSH or standard cluster commands, a hard restart is required.