Symptoms:
- Upgrading a Tanzu Kubernetes cluster via TMC, Tanzu CLI, or yaml edit results in no new control plane nodes.
- Scaling a Tanzu Kubernetes Cluster control plane count results in no new nodes.
- Machine Health Check of control plane nodes is not replacing a broken control plane node.
- Checking etcd status on the existing nodes shows that the etcd cluster is healthy and status is running.
- VMware Carbon Black Cloud Container Operator or another 3rd party security policy controller is deployed to the cluster and not allowing port forwarding in the kube-system namespace.
- Logs from the capi kubeadm control plane manager pod shows the following message:
Command to get logs from vSphere with Tanzu Supervisor pods - kubectl logs -n vmware-system-capw capi-kubeadm-control-plane-controller-manager-XXXXXXX -c manager
Command to get logs from TKG Management cluster pods - kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-XXXXX manager
controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="foo-prod" "kubeadmControlPlane"="foo-prod-control-plane" "namespace"="default" "failures"="machine foo-prod-control-plane-g7s2c reports EtcdMemberHealthy condition is unknown (Failed to connect to the etcd pod on the foo-prod-control-plane-g7s2c node: unable to create etcd client: endpoints: [etcd-foo-prod-control-plane-g7s2c], proxy.KubeConfig.Host: https://<KUBEAPI_IP>:6443: context deadline exceeded)"
Execute `etcdctl --cluster=true endpoint health --write-out=table` on the guest cluster
Output that shows that the etcd status on each member is healthy:
- Guest Cluster Control Plane logging will present logging similar to:
The apiserver pod logging might report security policy violations related to Port forwarding (the error below is presented if CarbonBlack PortBlock security policy is applied to kube-system namespace):
W0312 08:15:35.048653 1 dispatcher.go:161] rejected by webhook "resources.validating-webhook.cbcontainers": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue: "", RemainingItemCount: (*int64) (nil)}, Status: "Failure", Message: "admission webhook \"resources.validating-webhook.cbcontainers\" denied the request: Blocked by Kubernetes security policy "Kube-system\".\nViolated rule(s): \n Port forward\n", Reason:"", Details: (*v1.StatusDetails) (nil), Code:400}}
On the control plane node, journalctl -xeu containerd logs show:
failure attempting to dial 127.0.0.1:2379 failed to execute portforward in network namespace "host": failed to dial 2379: dial tcp4 127.0.0.1:2379: connect: connection refused