Guest Cluster (TKC) in False state after Storage outage

search cancel

Guest Cluster (TKC) in False state after Storage outage

book

Article ID: 421879

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service

Issue/Introduction

From Supervisor node when we check TKC/Cluster status it is in False State.
- NAMESPACE NAME CONTROL PLANE WORKER KUBERNETES RELEASE NAME AGE READY KUBERNETES RELEASE COMPATIBLE UPDATES AVAILABLE
  test-ns test-cluster x xx v1.xx.x---vmware.x-fips-vkr.x 368d False True
Unable to connect Guest cluster control plane VM from Supervisor or Jump server after underline storage was impacted.
Corresponding Service Engine and Pool status on AVI UI will also be Down.

Environment

VMware vSphere Kubernetes Service

Cause

The guest cluster nodes are in a False/NotReady state because a storage outage affected the datastore hosting the control plane and worker VMs. As a result, the Linux-based VMs switched their filesystems to read-only mode, causing the nodes to become unreachable.

From VMs Web Console it shows guest OS filesystem in Read-Only:

[30738551.551425] EXT4-fs error (device sda#) in ext4_dirty_inode:6207: Journal has aborted
[30738551.601173] EXT4-fs error (device sda#): ext4_journal_check_start:61: Detected aborted journal
[30738551.601891] EXT4-fs (sda#): Remounting filesysten read-only

Resolution

After the underlying storage issue is resolved, a reboot of the affected control plane and worker node VMs is required to bring their filesystems back from read-only to read-write state.

Feedback

thumb_up Yes

thumb_down No