Machines Are Failing to be Provisioned on Existing Cluster Due to Node Name Exceeding 63-character Limit During Scale Operation
search cancel

Machines Are Failing to be Provisioned on Existing Cluster Due to Node Name Exceeding 63-character Limit During Scale Operation

book

Article ID: 412930

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

New workers/machines fail to be provisioned after a scale operation on existing cluster. The machines enter a provisioning loop and never reached Running state. As a result, the cluster is under-resourced and pods can not be scheduled.

Reviewing the machine objects show that affected workers are stuck in Provisioned state with no nodename field. Other machines are stuck deleting, and only a subset of worker nodes remain in Running state.

Logs from kubelet/journal on the worker VMs show repeated registration failures:

Unable to register node with API server: Node "<nodename>.<domain>" is invalid: metadata.labels: Invalid value: "<nodename>.<domain>": must be no more than 63 characters

Cause

Kubernetes enforces a 63-character limit for node names used as metadata labels. The combination of the cluster name, MachineDeployment suffix, and the appended FQDN from DHCP creates node names longer than 63 characters. This causes kubelet to fail registration with the API server, leaving Machines stuck in Provisioning.

Resolution

If you experience this issue, collect the following information and contact Broadcom Support for assistance:

  • From management cluster context:
    • Output of tanzu cluster list
    • Output of kubectl get machines -A
    • Output of kubectl get md -A
  • From guest cluster context:
    • kubectl get nodes
  • Recent kubelet/journal logs from one of the affected worker nodes

Broadcom Support will review the environment configuration and provide guidance specific to your deployment.