Provisioning of Tanzu Kubernetes Grid Clusters via Tanzu Mission Control fails
search cancel

Provisioning of Tanzu Kubernetes Grid Clusters via Tanzu Mission Control fails

book

Article ID: 317071

calendar_today

Updated On:

Products

Tanzu Mission Control

Issue/Introduction

Symptoms:
  •  After registering a Tanzu Kubernetes Grid (TKG) management cluster as aTanzu Mission Control (TMC) management cluster, you see that creating workload clusters fails.
  • You can successfully create a similar workload cluster against the same TKG management cluster via the tanzu CLI.
  • When the cluster creation has failed in the TMC UI, you see messages similar to the following in the lcm-agent-extension pod logs in the TKG management cluster:

021-06-16T16:49:39.600Z ERROR controllers.TanzuVsphereCluster failed to create provisioner

{"tanzuvspherecluster": "default/tmctkg-vsphere-01", "error": "yaml: mapping values are not allowed in this context"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255

2021-06-15T17:03:53.641Z    ERROR    controllers.TanzuVsphereCluster    failed to create provisioner    {"tanzuvspherecluster": "default/tmc-cluster1", "error": "cannot read the bundled providersconfig.yaml: zip: not a valid zip file", "errorVerbose": "zip: not a valid zip file\ncannot read the bundled providersconfig.yaml\ngithub.com/vmware-tanzu-private/tkg-cli/pkg/tkgconfigupdater.

2021-06-16T16:49:39.600Z ERROR controller-runtime.controller Reconciler error

{"controller": "tanzuvspherecluster", "name": "tmctkg-vsphere-01", "namespace": "default", "error": "yaml: mapping values are not allowed in this context"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:231

 

  • You might also see that the lcm-tkg-extension crash loops because of an OOM issue:

kubectl describe po -n vmware-system-tmc lcm-tkg-extension-<uuid>

Name: lcm-tkg-extension-<uuid>

Namespace: vmware-system-tmc

<output redacted>

State: Running
Started: Wed, 16 Jun 2021 17:11:54 +0000
Last State: Terminated

Reason: OOMKilled
Exit Code: 137

Started: Wed, 16 Jun 2021 17:09:37 +0000
Finished: Wed, 16 Jun 2021 17:11:40 +0000

Ready: True
Restart Count: 2
Limits:
cpu: 200m
memory: 256Mi
Requests:
cpu: 0
memory: 64Mi
Environment: <none>
 


Environment

VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid Plus 1.x

Resolution

This is a known issue affecting TKG 1.3.x. There is currently no resolution.

Workaround:
You can workaround this issue via the following steps:

Note: Set your
kubectl context to that of the management cluster.

  1. Issue the following command to get the IP address of the node where the lcm-tkg-extension pod is running in the management cluster:

kubectl get node $(kubectl -n vmware-system-tmc get po --selector=app=lcm-tkg-extension -o=custom-columns='NODE:spec.nodeName' |grep -v NODE) -o jsonpath='{range .items[*]}{.status.addresses[?(@.type=="ExternalIP")].address}{"\n"}'

Note: You will see output similar to the following:

192.###.###.###

  1. Issue the following command to get the lcm-tkg-extension pod UUID value:

kubectl -n vmware-system-tmc get po --selector=app=lcm-tkg-extension -o=custom-columns='UUID:metadata.uid'

Note: You will see output similar to the following:

UUID
c05011dc-4a15-4c11-9a50-5f02a871ecc0

  1. SSH to the node using the node IP address noted in Step 1.
Note: See Connect to Cluster Nodes with SSH for detailed instructions on connecting to a node via SSH.
  1. Issue the sudo -i command to switch to the root user.
  2. Open the /var/lib/kubelet/pods/<UUID>/volumes/kubernetes.io~empty-dir/tkg/config.yaml file with a text editor.
Note: Replace <UUID> with the pod UUID value noted in Step 2.
  1. The first four lines of the file will look like the following:

release:     version: ""
providers:
  - name: cluster-api
    url: /etc/tkg/providers/cluster-api/v0.3.14/core-components.yaml

Note: Split the first line into two lines and indent the second line such that the first four lines look like the following:

release:
    version: ""
providers:
  - name: cluster-api

  1. Save and close the file.

Note: You can attempt to provision a cluster via the TMC UI at this point.

Additional Information

Impact/Risks:
This noted workaround is ephemeral and if the pod is restarted/recreated the issue will reoccur.