Error : "System error occurred on Master node with identifier #####." On a workload management node within the vCenter UI

search cancel

Error : "System error occurred on Master node with identifier #####." On a workload management node within the vCenter UI

book

Article ID: 370297

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Service (TKGs)

Issue/Introduction

In a Supervisor cluster, one of the nodes is not healthy. On vCenter GUI > Workload Management, you can see the following error:

Config Status

Current Status: Error

Status Messages:

System error occurred on Master node with identifier ################################. Details: Script ['/us/bin/kubectl', '-kubeconfig'. '/etc/kubernetes/admin.conf, 'get', 'node', '################################', '-o'. 'jsonpath=\'(.status.addresses[?(@.type == "InternaliP")].address)\"] failed: Command '['/usr/bin/kubectl', '-kubeconfig'. */etc/kubernetes/admin.conf, 'get', 'node', '################################', -0', jsonpath=\{.status.addresses[[email protected] == "InternaIIP")].address)\"]' returned non-zero exit status 1...

System error occurred on Master node with identifier ################################. Details: Failed to sync changes: Command '['/ust/bin/kubectl, '-kubeconfig', */etc/kubernetes/admin.conf, 'get', 'daemonset, '--namespace', 'vmware-system-logging'. '-o', 'json']' returned non-zero exit status 1. Will be retried...

From Control Plane nodes ssh session, checking the cluster nodes all nodes are ok and healthy, but the error on the vCenter GUI environment is still there.

Cause

File /etc/kubernetes/admin.conf has been changed, or certificates are not correctly updated, or expired. Certificates on this file does not match the correct string to be validated.

Resolution

Login into Supervisor Control Plane, and check if the certificates are valid

grep "certificate-authority-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

and

grep "client-certificate-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
Once confirmed that certificates are expired, run this command to renew them:

kubeadm certs renew all
Confirm certificates had been renewed:

grep "certificate-authority-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

and

grep "client-certificate-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
After changing the file, wait for a few minutes. Then go to vCenter GUI and check if the error has disappeared.

Feedback

thumb_up Yes

thumb_down No