Supervisor Cluster shows System error occurred on Master node with identifier
search cancel

Supervisor Cluster shows System error occurred on Master node with identifier

book

Article ID: 441507

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service Tanzu Kubernetes Runtime

Issue/Introduction

In the vCenter web UI under Workload Management (vCenter 8)/Supervisor Management (vCenter 9), one or more Supervisor clusters are in Configuring or Error state.

 

Viewing the Configuring or Error status shows an error similar to one of the below:

Configured Control Plane VMs
Configuration error (since MM/DD/YYYY, HH:MM:SS AM/PM)
System error occurred on Master node with identifier <supervisor DNS name>. Details: Failed to sync changes: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'cm', '<configmap name>', '--namespace', 'vmware-system-nsx', '-o', 'yaml']' returned non-zero exit status 1.. Will be retried.

Customized guest of Supervisor Control plane VM
Configuration error (since MM/DD/YYYY, HH:MM:SS AM/PM)
System error occurred on Master node with identifier <supervisor DNS name>. Details: Log forwarding sync update failed: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'configmap', 'fluentbit-config-system', '--namespace', 'vmware-system-logging', '--ignore-not-found=true', '-o', 'json']' returned non-zero exit status 1.

Configured Supervisor Control plane VM's Workload Network
Configuration error (since MM/DD/YYYY, HH:MM:SS AM/PM)
System error occurred on Master node with identifier <supervisor DNS name>. Details: Failed to sync changes: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'cm', '<configmap name>', '--namespace', 'vmware-system-nsx', '-o', 'yaml']' returned non-zero exit status 1.. Will be retried.

 

Kubectl commands fail with the following error messages:

The server has asked for the client to provide credentials.
You must be logged into the server.

 

From SSH into the vCenter VM, the wcp-certmgr script to check on the certificates for the Supervisor cluster reports one or more expired certificates:

./certmgr certificates list

See KB: Replace vSphere Supervisor Certificates

 

While SSH to the Supervisor control plane VMs, the following symptoms are observed:

  • kubectl commands fail with the below errors:
    The server has asked for the client to provide credentials.
    You must be logged into the server.
  • The latest logs for kube-apiserver show that one or more certificates have expired:
    • Check by the newest kube-apiserver container process:
      crictl ps -a | grep kube-apiserver
      
      crictl logs <kube-apiserver container id>
    • Or look in /var/log/pods/ for kube-system_kube-apiserver logs:
      ls /var/log/pods/ | grep kube-apiserver
      
      ls /var/log/pods/kube-system_kube-apiserver-<supervisor DNS name>/kube-apiserver/


Environment

vSphere Supervisor

Cause

The error message in the Supervisor cluster's status the result of system failure to run regularly scheduled kubectl commands which include health checks and status updates.

In this scenario, kubectl commands are failing due to expired Kubernetes certificates.

vSphere Supervisor adheres to the upstream Kubernetes certificate expiration of 1 year.

Expired Kubernetes certificates can lead to loss of functionality such as kubectl commands failing or kube-apiserver container process failure.

Although these expired certificates will prevent management of the VKS environment, existing workloads will continue to run without impact.

Resolution

Renew the Supervisor cluster's certificates using the wcp-certmgr script.

See KB: Replace vSphere Supervisor Certificates