Supervisor and Worker nodes are down. Clusters are on configuring State, and losing connection with API

search cancel

Supervisor and Worker nodes are down. Clusters are on configuring State, and losing connection with API

book

Article ID: 404350

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

on vCenter UI Cluster status, you can see errors like this:

Customized guest of Supervisor Control plane VM Configuration error (since 7/16/xxxx, 5:04:13 AM)
System error occurred on Master node with identifier xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Details: Log forwarding sync
update failed: Command '['/usr/bin/kubectl', ' -- kubeconfig", '/etc/kubernetes/admin.conf', 'get', 'configmap', 'fluentbit-config-
system', ' -- namespace', 'vmware-system-logging', ' -- ignore-not-found=true', '-o', 'json']' returned non-zero exit status 1.

Failed to delete RoleBinding [email protected] in namespace svc-contour-domain-c####. API server returned error 'rolebindings.rbac.authorization.k8s.io "wcp:svc-contour-domain-c####:user:vsphere.local:xxxxxxx" is forbidden: User "sso:[email protected]"' cannot delete resource "rolebindings" in API group "rbac.authorization.k8s.io" in the namespace "svc-contour-domain-c####". This operation will be retried.

Cause

Found space issue on 3 Supervisor Control Plane nodes root "/" partition:

# df -h | head
Filesystem Size Used Avail Use% Mounted on
/dev/root 32G 32G 0 100% / devtmpfs 7.9G 0 7.9G 0% /dev tmpfs 7.9G 212K 7.9G 1% /dev/shm tmpfs 3.2G 10M 3.2G 1% /run tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup tmpfs 7.9G 14M 7.8G 1% /tmp

Resolution

Clean up disk space

We have cleaned historical log files on /var/log/vmware with this commands:

- Check journal logs, and purge it:

journalctl --disk-usage

journalctl --vacuum-time=2d

- audit logs

deleted old files from /var/log/vmware/audit directory

- compress /var/log/vmware/upgrade-ctl-cli.log files

cd /var/log/vmware/

tar .czvf /var/log/vmware/upgrade-ctl-cli-bck.tar.gz upgrade-ctl-cli.log.?

Then delete the compressed files:

rm upgrade-ctl-cli.log.?

Run the steps on all 3 SV CP nodes.

Feedback

thumb_up Yes

thumb_down No