Guest Cluster node stuck in provisioned state because the "spec.providerID" is missing.

search cancel

Guest Cluster node stuck in provisioned state because the "spec.providerID" is missing.

book

Article ID: 416869

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The Guest Cluster node is stuck in "Provisioned" state.
CAPI is unable to transition the status of the node to a "Running" state because it is waiting for the provider ID to get associated with the node.
The describe output of the affected machine confirms the same as well.

* Machine <machine-ID>:
* NodeHealthy: Waiting for a Node with spec.providerID vsphere://<provider ID> to exist
* Control plane components: Waiting for a Node with spec.providerID vsphere://<provider ID> to exist
* EtcdMemberHealthy: Waiting for a Node with spec.providerID vsphere://<provider ID> to exist
On top of this, the taint "node.cloudprovider.kubernetes.io/uninitialized" is also associated with that node.
The pods trying to come up on the affected node are also stuck in "Pending" state because of the taint "node.cloudprovider.kubernetes.io/uninitialized" associated with the node.

Warning FailedScheduling <time in sec>s default-scheduler nodes are available:
node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1
node(s) were unschedulable, node(s) had untolerated taint
{node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: nodes
are available: Preemption is not helpful for scheduling.

Environment

vSphere Kubernetes Service

Cause

The spec.providerID inside the node is not yet appended. The vSphere CPI is responsible for setting the providerID during Node Initialization. The CPI sets up the provider ID into the `node` object, and then the node is moved to `Running` by CAPI. The taint is removed once the provider ID is successfully associated with the node.

Resolution

NOTE: It is not recommended to either remove the taint or append the provider ID manually.

vSphere Cloud Provider Interface runs as pod inside the guest cluster. Below is how it looks under normal circumstances. Look inside the guest cluster to check if there are any issues with the Cloud Provider Interface allocating provider ID to the node.

This also happens when the Cloud Provider Interface pods are missing inside the guest cluster. Per the example below, the guest-cluster-cloud-provider deployment is scaled down to zero.

root@<Node-ID> [ ~ ]# k get deployment -A | grep -i cloud
vmware-system-cloud-provider guest-cluster-cloud-provider 0/0

To fix the issue

Scale up the guest-cluster-cloud-provider deployment back to 1

root@<Node-ID> [ ~ ]# k scale deployment guest-cluster-cloud-provider -n vmware-system-cloud-provider --replicas=1
deployment/guest-cluster-cloud-provider scaled
The pod should get created and be back to a Running state.

root@<Node-ID> [ ~ ]# k get pods -A | grep -i cloud
vmware-system-cloud-provider guest-cluster-cloud-provider-<ID> 1/1 Running 0

Feedback

thumb_up Yes

thumb_down No