Guest Cluster (TKC) in False state after Storage outage
search cancel

Guest Cluster (TKC) in False state after Storage outage

book

Article ID: 421879

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service

Issue/Introduction

  • From Supervisor node when we check TKC/Cluster status it is in False State. 

    • NAMESPACE        NAME              CONTROL PLANE           WORKER         KUBERNETES RELEASE NAME           AGE    READY   KUBERNETES RELEASE COMPATIBLE   UPDATES AVAILABLE
      test-ns      test-cluster        x                 xx        v1.xx.x---vmware.x-fips-vkr.x   368d   False          True

  • Unable to connect Guest cluster control plane VM from Supervisor or Jump server after underline storage was impacted.

  • Corresponding Service Engine and Pool status on AVI UI will also be Down.

Environment

VMware vSphere Kubernetes Service

Cause

The guest cluster nodes are in a False/NotReady state because a storage outage affected the datastore hosting the control plane and worker VMs. As a result, the Linux-based VMs switched their filesystems to read-only mode, causing the nodes to become unreachable.

  • From VMs Web Console it shows guest OS filesystem in Read-Only:

    [30738551.551425] EXT4-fs error (device sda#) in ext4_dirty_inode:6207: Journal has aborted
    [30738551.601173] EXT4-fs error (device sda#): ext4_journal_check_start:61: Detected aborted journal
    [30738551.601891] EXT4-fs (sda#): Remounting filesysten read-only

Resolution

After the underlying storage issue is resolved, a reboot of the affected control plane and worker node VMs is required to bring their filesystems back from read-only to read-write state.