Workload Cluster Creation in Air-gapped Environments Fails since Kapp-Controller Pods is trying to Pull its Image from Public Repository and not the Private Repository
search cancel

Workload Cluster Creation in Air-gapped Environments Fails since Kapp-Controller Pods is trying to Pull its Image from Public Repository and not the Private Repository

book

Article ID: 376688

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • Tanzu workload cluster creation in Air-gapped Environments  failed after creating the first control plane node.
  • The kapp-controller pod is in on the workload cluster in not in Init:ImagePullBackOff state.

tkg-system    kapp-controller-6b87746484-xqzp9                         0/2     Init:ImagePullBackOff   0          88m

  • The describe of the workload cluster kapp-controller pod will show that its is faling to pull the kapp-controller from public registry  "projects.registry.vmware.com" instead of the private registry ""privet-registry.domain.com

Events:
  Type    Reason   Age                    From     Message
  ----    ------   ----                   ----     -------
  Normal  BackOff  3m51s (x282 over 68m)  kubelet  Back-off pulling image "projects.registry.vmware.com/tkg/kapp-controller:v0.30.0_vmware.1"

  • The workload cluster kapp-controller pod is trying to pull the image from the Public Repository instead the Privet Repository that was used to create the workload cluster.
  • The TKG_CUSTOM_IMAGE_REPOSITORY variables is in the config.yaml file that was used to create the workload cluster and is pointing to the Private-registry.  

TKG_CUSTOM_IMAGE_REPOSITORY: Private-registry.domain.com/k8sbuild/projects.registry.vmware.com/tkg

TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: "false"

TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVyakNDQTVhZ0F3SUJBZ0lRR2NKOVgwN2V3YWxIVUZNVDVZT0g3akFOQmdrcWhraUc5dzBCQVFzRkFEQVUKTVJJd0VBWURWUVFERXdsWFUwbFNiMjkwUTBFd0hoY05NVGN3TWpFM01UZ3hRREV3bFhVMGxTYjI5MFEwRXdnZ0VpTUEwR0NTcUdTSWIzRFFFQkFRVUFBNElCCkR3QXdnZ0VLQW > snip< 9LSEdNQzR3V0prOGMxdU1URG93NGp6dWV1ZzIzU2Y1ZnpxYk5HZDl6dApkQXBiSlR2VDFGSWZCSzEzL3AvVUkxVTdnY0lWNzdTczdTUi9rVm52OGMxZV

  • Other workload cluster pods are able to pull images fine from the privet-registry.domain.com.

Ex: 

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  35s   default-scheduler  Successfully assigned kube-system/coredns-5f65795997-rvd49 to workload-slot35rp02-md-0-84b99b6ccc-mpxwb
  Normal  Pulling    35s   kubelet            Pulling image "private-registry.domain.com/k8sbuild/projects.registry.vmware.com/tkg/coredns:v1.8.6_vmware.17"
  Normal  Pulled     12s   kubelet            Successfully pulled image "private-registry.domain.com/k8sbuild/projects.registry.vmware.com/tkg/coredns:v1.8.6_vmware.17" in 22.805351343s
  Normal  Created    12s   kubelet            Created container coredns
  Normal  Started    12s   kubelet            Started container coredns

 

  • The workload cluster kapp-controller package running on the management cluster is in ReconcileFailed.

# kubectl describe pkgi workload_cluster_name-kapp-controller -n <namespace>

Name:         workload_cluster_name-kapp-controller
Namespace:    default
Labels:       <none>
Annotations:  tkg.tanzu.vmware.com/cluster-name: workload_cluster_name
              tkg.tanzu.vmware.com/cluster-namespace: default
API Version:  packaging.carvel.dev/v1alpha1
Kind:         PackageInstall

 

Status:
  Conditions:
    Message:               Error (see .status.usefulErrorMessage for details)
    Status:                True
    Type:                  ReconcileFailed
  Friendly Description:    Reconcile failed: Error (see .status.usefulErrorMessage for details)
  Last Attempted Version:  0.48.2+vmware.1-tkg.1
  Observed Generation:     1
  Useful Error Message:    kapp: Error: waiting on reconcile deployment/kapp-controller (apps/v1) namespace: tkg-system:
  Finished unsuccessfully (Deployment is not progressing: ProgressDeadlineExceeded (message: ReplicaSet "kapp-controller-6b87746484" has timed out progressing.))
  Version:  0.48.2+vmware.1-tkg.1
Events:     <none

  • The management cluster was deployed using the public repository projects.registry.vmware.com.

 

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware Tanzu Kubernetes Grid.

VMware Tanzu Kubernetes Grid Management (TKGm).

Cause

 

  • The workload cluster kapp-controller  is installed and managed as a packages by kapp-controller on the management cluster.
  • The workload cluster kapp-controller will inherit the same settings from the  management cluster kapp-controllerand one of these setting is the image registry.
  • Since the Management cluster  was created using the Public repository projects.registry.vmware.com/tkg, This caused the  workload cluster kapp-controller  pod to pull the image from public registry instead of the  private registry and since this is an  Air-gapped Environments the image pull failed.
  • Currently VMware Tanzu Kubernetes Grid only supports one registry argument, which mean we can not have a dedicated registry for workload cluster different  than the one used by the management cluster. 

Resolution

 

  • Create a new management cluster using a private repository and then create the workload cluster again.

Note: Since the workload cluster needs to be created using a private repository,  then the management cluster that creates the workload cluster needs to be created using the same private repository