Cluster creation stuck in pending state due to capi-controller-manager in crashloop state
search cancel

Cluster creation stuck in pending state due to capi-controller-manager in crashloop state

book

Article ID: 416442

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In some cases during cluster creation, capi-controller-manager will sometimes not get the clusterclass from the runtime-extension pods. This can leave the cluster stuck in a pending state with machines looping between provisioning and deleting. capi-controller-manager  pods will also be stuck in crash loopback state.

capi-controller-manager cannot get the clusterclass from the runtime-extension pods. Will see messages in the logs in the form of:

"Reconciler error" err="failed to discover variables for ClusterClass builtin-generic-v3.1.0: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": failed to get extension handler \"discover-variables.runtime-extension\" from registry: invalid operation: Get cannot be called on a registry not yet ready" controller="clusterclass" 

 

Environment

8.0U3

Cause

This is due to runtime-extension-controller-manager failing to reach supervisor node. You may see certificate errors or something similar in the logs.

Resolution

Issue can be resolved by rebooting supervisor node and doing a rollout restart of  the runtime-extension-controller-manager  deployment.