Supervisor and Worker nodes are down. Clusters are on configuring State, and losing connection with API
search cancel

Supervisor and Worker nodes are down. Clusters are on configuring State, and losing connection with API

book

Article ID: 404350

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

on vCenter UI Cluster status, you can see errors like this:

 

Customized guest of Supervisor Control plane VM Configuration error (since 7/16/xxxx, 5:04:13 AM)
System error occurred on Master node with identifier xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Details: Log forwarding sync
update failed: Command '['/usr/bin/kubectl', ' -- kubeconfig", '/etc/kubernetes/admin.conf', 'get', 'configmap', 'fluentbit-config-
system', ' -- namespace', 'vmware-system-logging', ' -- ignore-not-found=true', '-o', 'json']' returned non-zero exit status 1.

 

or 

 

Failed to delete RoleBinding [email protected] in namespace svc-contour-domain-c####. API server returned error 'rolebindings.rbac.authorization.k8s.io "wcp:svc-contour-domain-c####:user:vsphere.local:xxxxxxx" is forbidden: User "sso:[email protected]"' cannot delete resource "rolebindings" in API group "rbac.authorization.k8s.io" in the namespace "svc-contour-domain-c####". This operation will be retried.

 

 

Cause

Found space issue on 3 Supervisor Control Plane nodes root "/" partition:

# df -h | head
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        32G   32G    0  100% /
devtmpfs        7.9G     0  7.9G   0% /dev
tmpfs           7.9G  212K  7.9G   1% /dev/shm
tmpfs           3.2G   10M  3.2G   1% /run
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
tmpfs           7.9G   14M  7.8G   1% /tmp

 

Resolution

Clean up disk space

   We have cleaned historical log files on /var/log/vmware with this commands:

   - Check journal logs, and purge it:

   journalctl --disk-usage

   journalctl --vacuum-time=2d

   - audit logs

   deleted old files from /var/log/vmware/audit directory

   - compress /var/log/vmware/upgrade-ctl-cli.log files

   cd /var/log/vmware/

   tar .czvf /var/log/vmware/upgrade-ctl-cli-bck.tar.gz upgrade-ctl-cli.log.?

 

Then delete the compressed files:

   rm upgrade-ctl-cli.log.?

 

   Run the steps on all 3 SV CP nodes.