Updating TKGS from 3.3.2-embedded to 3.3.3-embedded hangs with error:
Configured Core Supervisor ServicesService: tkg.vsphere.vmware.com. Reason: ReconcileFailed. Message: kapp: Error: waiting on reconcile packageinstall/tanzu-cluster-api-bootstrap-kubeadm (packaging.carvel.dev/v1alpha1) namespace: svc-tkg-domain-####: Finished unsuccessfully (Reconcile failed: (message: kapp: Error: waiting on reconcile deployment/capi-kubeadm-bootstrap-controller-manager (apps/v1) namespace: svc-tkg-domain-#####: Finished unsuccessfully (Deployment is not progressing: ProgressDeadlineExceeded (message: ReplicaSet "capi-kubeadm-bootstrap-controller-manager-##########" has timed out progressing.)))).Service: velero.vsphere.vmware.com. Status: RunningThe failure is accompanied by a pod stuck in Pending state with scheduling events referencing unavailable host ports.
Warning FailedScheduling 2m13s (x37452 over 26d) default-scheduler 0/7 nodes are available: 3 node(s) didn't have free ports for the requested pod ports, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/7 nodes are available: 3 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling.# k get po -o yaml -A | grep -i hostport | sort | uniq -c | grep -E '9875|9441|8085' 4 hostPort: 8085 3 hostPort: 9441 3 hostPort: 9875Both the capi-kubeadm-bootstrap-
Although the capi-kubeadm deployment is configured for two replicas, a third pod was observed in a Pending state during rollout, likely due to a transient update or restart triggering a temporary additional pod. Velero was already occupying hostPort 8085 on one node, and the other two nodes were used by the running capi-kubeadm pods, leaving no node available to schedule the third pod. This resulted in a scheduling deadlock that blocked the deployment and caused the upgrade to stall.
Temporarily scale down the Velero Supervisor Service to release hostPort 8085.
kubectl scale deploy/backup-driver -n velero --replicas=0
Once the port is freed, the stuck capi-kubeadm pod will schedule, and the upgrade will proceed. After the rollout completes and the deployment stabilizes with only two replicas, Velero can be safely scaled back up if needed. It will land on the third node where hostPort 8085 is no longer in use, avoiding further conflict.
kubectl scale deploy/backup-driver -n velero --replicas=1