Error : "System error occurred on Master node with identifier #####." On a workload management node within the vCenter UI
search cancel

Error : "System error occurred on Master node with identifier #####." On a workload management node within the vCenter UI

book

Article ID: 370297

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Service (TKGs)

Issue/Introduction

In a Supervisor cluster, one of the nodes is not healthy. On vCenter GUI > Workload Management, you can see the following error:

Config Status

Current Status: Error

Status Messages:

System error occurred on Master node with identifier ################################. Details: Script ['/us/bin/kubectl', '-kubeconfig'. '/etc/kubernetes/admin.conf, 'get', 'node', '################################', '-o'. 'jsonpath=\'(.status.addresses[?(@.type == "InternaliP")].address)\"] failed: Command '['/usr/bin/kubectl', '-kubeconfig'. */etc/kubernetes/admin.conf, 'get', 'node', '################################', -0', jsonpath=\{.status.addresses[[email protected] == "InternaIIP")].address)\"]' returned non-zero exit status 1...

 

System error occurred on Master node with identifier ################################. Details: Failed to sync changes: Command '['/ust/bin/kubectl, '-kubeconfig', */etc/kubernetes/admin.conf, 'get', 'daemonset, '--namespace', 'vmware-system-logging'. '-o', 'json']' returned non-zero exit status 1. Will be retried...

 

From Control Plane nodes ssh session, checking the cluster nodes all nodes are ok and healthy, but the error on the vCenter GUI environment is still there.

Cause

File /etc/kubernetes/admin.conf has been changed, or certificates are not correctly updated, or expired. Certificates on this file does not match the correct string to be validated.

Resolution

  1. Login into Supervisor Control Plane, and check if the certificates are valid

    grep "certificate-authority-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

    and

    grep "client-certificate-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

  2. Once confirmed that certificates are expired, run this command to renew them:       

    kubeadm certs renew all

  3. Confirm certificates had been renewed:

    grep "certificate-authority-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

    and

    grep "client-certificate-data: " /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

  4. After changing the file, wait for a few minutes. Then go to vCenter GUI and check if the error has disappeared.