TKGS Worker Node Stuck in Provisioned state due to NodePool Label containing Spaces
search cancel

TKGS Worker Node Stuck in Provisioned state due to NodePool Label containing Spaces

book

Article ID: 319399

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

Symptoms:
  • Nodepool machines remain stuck in Provisioned state but the corresponding VM and WCPMachine are poweredOn and assigned a ProviderID & IP address respectively.
  • When creating a TKGS guest cluster, the control plane nodes reach Running state without issue. However, the nodepool machines remain stuck in Provisioned state.
  • The YAML for the guest cluster has a nodepool label containing spaces.
  • The Provisioned worker node's kubelet is continuously crashing with an Unknown command error equal to the value after the first space in the nodepool label value.

 

  • For example, with the nodepool label of:

labels: my/label: standard vmware node

  • Kubelet logs on the Provisioned worker would be reporting an unknown command error upon startup similar to one of the below:

Error: unknown command vmware
"Unknown command" command="vmware"

The above error appears due to the spaces in the nodepool label, regardless of quotation marks.


NOTE:

  • When escape characters are used on the spaces in the nodepool label, kubelet instead crashes on an alphanumeric character syntax error.
  • For example, with the nodepool label of:

labels: my/label: standard\ vmware\ node

  • Kubelet logs on the Provisioned worker would be returning an error similar to the below:

"Unable to register node with API server" err="Node \"my-worker-nodepool\" is invalid: metadata.labels: Invalid value: \"standard vmware node\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"


Environment

VMware vSphere 8.0 with Tanzu
VMware vSphere 7.0 with Tanzu

Cause

At this time, Kubelet doesn't support spaces in certain fields. It interprets the spaced values in the nodepool label as args and fails to parse them correctly, resulting in a continuous crashing state.

Resolution

Option A) Redeploy the guest cluster without spaces in the nodepool label.

 

Option B) Edit the guest cluster manifest to remove the spaces in the nodepool label.


Additional Information

Multiple Nodepool Labels cause Continuous Worker Node Rollouts on vSphere with Tanzu Guest Cluster (92078)


Impact/Risks:

All worker nodes under the nodepool with a label containing spaces will remain indefinitely in Provisioned state. Kubernetes will continue to recreate the worker nodes at the 120 minute mark but each attempt will remain stuck in Provisioned state.