Supervisor Cluster Configuration Error: Control Plane Node in NotReady State
search cancel

Supervisor Cluster Configuration Error: Control Plane Node in NotReady State

book

Article ID: 423315

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Under Workload Management > Supervisors, one or more nodes report a Configuration Error. Detailed error messages in the vCenter UI may include:

Customized guest of Supervisor Control plane VM
 • Configuration error (since dd/mm/yyyy, hh:mm:ss)
 • System error occurred on Master node with identifier ############. Details: Log forwarding sync update failed: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'configmap', 'fluentbit-config-system', '--namespace', 'vmware-system-logging', '--ignore-not-found=true', '-o', 'json']' returned non-zero exit status 1..
Configured Supervisor Control plane VM's Management Network
Configured Supervisor Control plane VM as Kubernetes Control Plane Node
Configured Supervisor Control plane VM's Workload Network
 • Configuration error (since dd/mm/yyyy, hh:mm:ss)
 • System error occurred on Master node with identifier ############. Details: Timed out waiting for APIServer Pod spec to reflect changes done to manifest file.
 • System error occurred on Master node with identifier ############. Details: Nginx proxy config for Pinniped update failed: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'svc', 'pinniped-supervisor', '--namespace', 'vmware-system-pinniped', '--ignore-not-found=true', '-o', 'jsonpath={.spec.clusterIP}']' returned non-zero exit status 1..
 • System error occurred on Master node with identifier ############. Details: Nginx proxy config for authproxy update failed: Command '['/usr/bin/kubectl', '--kubeconfig', '/etc/kubernetes/admin.conf', 'get', 'secret', 'wcp-authproxy-client-secret', '--namespace', 'kube-system', '--ignore-not-found=true', '-o', 'json']' returned non-zero exit status 1..

  • Running kubectl get nodes displays one or more Supervisor control plane nodes in a NotReady state.
  • Checking the kubelet logs on the impacted node with journalctl -xeu kubelet displays authentication or connection refused errors:

journalctl -xeu kubelet 

Output:

YYYY-MM-DD:T:HH:MM:SS <Node ID> kubelet[######]: status_manager.go:853] "Failed to get status for pod" podUID="###########" pod="kube-system/wcp-authproxy-##########" err="Get \"https://127.0.0.1:6443/api/v1/namespaces/kube-system/pods/wcp-authproxy-########\": dial tcp 127.0.0.1:6443: connect: connection refused"

Environment

vSphere Kubernetes Service

Cause

  • This issue is typically caused by a hang or failure in the kubelet service on a Supervisor control plane node, preventing it from reporting a Ready status to the API server.
  • An uncommon but potential issue is if the management network NIC of the affected Supervisor control plane node has been disconnected at the VM level.

Resolution

To resolve this issue, first confirm if the management network NIC on the affected Supervisor control plane node VM is disconnected.

  1. If the management network NIC is disconnected, connect directly to the ESXi host GUI where the affected Supervisor control plane node VM is running.
  2. Select the Supervisor control plane node VM from the VM inventory list.
  3. Edit the VM settings and select the "Connect" check box for the network adapter used by the Supervisor control plane management network.

If the management network NIC is connected, proceed to restart the kubelet service on the affected node.

  1. Access the affected Supervisor Control Plane VM via SSH.
  2. Review this command before running it. Verify if the root filesystem is full

    df -h

  3. If the root disk partition is above 80%, identify and clear large logs to free up space  Refer kb vSphere Supervisor Root Disk Space Full at 100%.

  4. Restart the kubelet service

    systemctl restart kubelet

  5. Confirm the node status has returned to Ready by executing:

    kubectl get nodes

  6. Verify that the Config Status in the vCenter UI transitions back to Running.

Additional Information

Japanese KB: Supervisorクラスタが構成エラーになる