Error: "error while bootstrapping the machine [<cluster_name>/EPHEMERAL-TEMP-VM]; timeout for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status]" when trying to create a Tanzu Kubernetes Grid Cluster
search cancel

Error: "error while bootstrapping the machine [<cluster_name>/EPHEMERAL-TEMP-VM]; timeout for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status]" when trying to create a Tanzu Kubernetes Grid Cluster

book

Article ID: 382500

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • Creating Tanzu Kubernetes Grid clusters by using the Kubernetes Container Clusters plug-in fails.
  • In the Kubernetes Container Clusters plug-in the Events for the failed cluster show a ScriptExecutionTimeout error with a message of the form:

[error while bootstrapping the machine [test01-psql/EPHEMERAL-TEMP-VM]; timeout for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status]] during cluster creation

  • The /var/log/cloud-final.err log the EPHEMERAL_TEMP_VM shows creation completed but only after a long time:

<time_stamp> <cluster_name>@<org_name>/<user_name>:
Cloud-init v. 23.2.2-0ubuntu0~20.04.1 running 'modules:final' at <time_stamp>. Up 12.34 seconds.
Docker, KinD, clusterctl installation, and <cluster_name> creation completed after 5123.45 seconds

  • The /var/log/cloud-final.err log the EPHEMERAL_TEMP_VM shows slow download speeds from the configured container registry:

<time_stamp> <cluster_name>@<org_name>/<user_name>: projects.packages.broadcom.com/vmware-cloud-director/kind-airgapped:v0.19.0:      resolved       |ESC[32m++++++++++++++++++++++++++++++++++++++ESC[0m|
<time_stamp> <cluster_name>@<org_name>/<user_name>: manifest-sha256:################################################################: done           |ESC[32m++++++++++++++++++++++++++++++++++++++ESC[0m|
<time_stamp> <cluster_name>@<org_name>/<user_name>: layer-sha256:################################################################:    downloading    |ESC[32m+++ESC[0m-----------------------------------|  8.0 MiB/89.5 MiB
<time_stamp> <cluster_name>@<org_name>/<user_name>: config-sha256:################################################################:   done           |ESC[32m++++++++++++++++++++++++++++++++++++++ESC[0m|
<time_stamp> <cluster_name>@<org_name>/<user_name>: elapsed: 30.2s                                                                    total:  8.0 Mi (271.3 KiB/s)

Environment

  • VMware Cloud Director 10.6
  • VMware Cloud Director 10.5
  • VMware Cloud Director Container Service Extension 4.2

Cause

This issue occurs if the time taken by the EPHEMERAL_TEMP_VM to download the required images and deploy the bootstrap cluster is longer than the Container Service Extension Server's timeout.

If the takes too long to complete the bootstrap phase then the Container Service Extension Server will mark the creation with the ScriptExecutionTimeout error which will be seen in the Kubernetes Container Clusters plug-in Events for the failed cluster.

Resolution

Ensure that the EPHEMERAL_TEMP_VM has sufficient bandwidth to download the required images from the configured container registry.
For example increase any rate limits on the networks used to reach the container registry.
By default Container Service Extension uses the Broadcom container registry, "projects.packages.broadcom.com".
NOTE: Older versions of Container Service Extension may be using the VMware URL, "projects.registry.vmware.com", which redirects to the Broadcom one.

If the download speeds from the Broadcom container registry cannot be increased then consider using a local container registry to host the images and increase the download speeds to the EPHEMERAL_TEMP_VM by allowing it to download from a local network.
Details are in the Container Service Extension Documentation here, Set up a Local Container Registry in an Air-gapped Environment.

It may also be possible to increase the Container Server Extension Server's timeouts to allow more time for the cluster creation, however this option is not preferred as it introduces further delays into the operations of the clusters, Increasing Timeouts for Cloud Director Container Service Extension to resolve slow Tanzu Kubernetes Grid cluster creation errors.