The environment where the Tanzu Kubernetes Grid (TKG) management cluster is being deployed is air-gapped or internet restricted.
Management cluster creation fails with the components failing on the kind cluster
You see messages similar to the following in the kubelet logs in the kind cluster:
Jul 21 10:34:16 tkg-kind-c3rvc9c2ebhs2blepc6g-control-plane kubelet[585]: E0721 10:34:16.734509 585 pod_workers.go:191] Error syncing pod 51f2dab2-264e-493e-9f91-812b81b761ba ("kube-proxy-zkcnz_kube-system(51f2dab2-264e-493e-9f91-812b81b761ba)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-zkcnz_kube-system(51f2dab2-264e-493e-9f91-812b81b761ba)"Jul 21 10:34:16 tkg-kind-c3rvc9c2ebhs2blepc6g-control-plane kubelet[585]: E0721 10:34:16.965954 585 manager.go:1123] Failed to create existing container: /docker/9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e: failed to identify the read-write layer ID for container "9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e". - open /var/lib/docker/image/overlay2/layerdb/mounts/9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e/mount-id: no such file or directoryJul 21 10:34:16 tkg-kind-c3rvc9c2ebhs2blepc6g-control-plane kubelet[585]: E0721 10:34:16.967054 585 manager.go:1123] Failed to create existing container: /docker/9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e/docker/9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e: failed to identify the read-write layerID for container "9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e". - open /var/lib/docker/image/overlay2/layerdb/mounts/9f6a79717cf271bed7546c36a900c15d77e1c0513b553d2baecfc8d20c9ab17e/mount-id: no such file or directory
2021-07-21T10:58:39.962429341Z stderr F I0721 10:58:39.961983 1 node.go:172] Successfully retrieved node IP: 1##.##.#.#2021-07-21T10:58:39.96247252Z stderr F I0721 10:58:39.962062 1 server_others.go:142] kube-proxy node IP is an IPv4 address (1##.##.#.#), assume IPv4 operation2021-07-21T10:58:39.975832251Z stderr F W0721 10:58:39.975625 1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy2021-07-21T10:58:39.975853193Z stderr F I0721 10:58:39.975733 1 server_others.go:185] Using iptables Proxier.2021-07-21T10:58:39.976165859Z stderr F I0721 10:58:39.976040 1 server.go:650] Version: v1.20.5+vmware.12021-07-21T10:58:39.976482573Z stderr F I0721 10:58:39.976419 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 1310722021-07-21T10:58:39.976491689Z stderr F F0721 10:58:39.976441 1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied{"mode":"full","isActive":false}
Note: You can view the logs from the kind cluster by using the kubectl logs command with the kubeconfig file located in the .kube-tkg/tmp folder.
kind cluster creation as per the steps documented fails on air-gapped environments because the CA certificate (encoded in TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE variable) is not injected in to the manually created bootstrap kind cluster.
We need to create the kind cluster along with the configuration for the Harbor registry certs that needed to be updated to the containerd config file.
kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4name: tkg-kindnodes: - role: control-plane # This option mounts the host docker registry folder into # the control-plane node, allowing containerd to access them. extraMounts: - containerPath: /etc/containerd/harbor1.corp.tanzu hostPath: /etc/docker/certs.d/harbor1.corp.tanzucontainerdConfigPatches: - |- [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor1.corp.tanzu".tls] ca_file = "/etc/containerd/harbor1.corp.tanzu/ca.crt"
Notes:
ca_file -- CA certificate of Harbor registryhostPath -- path of Harbor CA certificate on the bootstrap VM where the kind cluster is createdcontainerPath -- path of Harbor CA certificate on the kind containerharbor.corp.tanzu" with your registry nametkg-kind" because in case this kind cluster fails to bootstrap it will be easy to collect a crashd log bundle as by default crashd looks for the kind cluster with this name.
kind create cluster --config kind.yml
Note: You can exec into the kind container if required and check if the file /etc/containerd/config.toml has been populated with the expected registry info and certificate data. Optionally, you can try pulling an image from the registry using a command similar to the following:
crictl pull <imagename>
tanzu management-cluster create command, similar to the following:tanzu management-cluster create --file vsphere-mc.yaml --use-existing-bootstrap-cluster