After the abrupt disruption of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards, one or more of the following symptoms are presented.
message: 'admission webhook "admission.vmware.com" denied the request:Cannot add toleration {key:node-role.kubernetes.io/master, effect:NoSchedule }Service: velero.vsphere.vmware.com. Reason: ReconcileFailed. Message: kapp: Error: Timed out waiting after 15m0s for resources: [deployment/velero-vsphere-operator-webhook (apps/v1) namespace: svc-velero-domain-c#].
Service: tkg.vsphere.vmware.com. Reason: ReconcileFailed. Message: kapp: Error: waiting on reconcile packageinstall/tanzu-cluster-api-control-plane-kubeadm (packaging.carvel.dev/v1alpha1) namespace: svc-tkg-domain-c#: Finished unsuccessfully (Reconcile failed: (message: kapp: Error: Timed out waiting after 15m0s for resources: [deployment/capi-kubeadm-control-plane-controller-manager (apps/v1) namespace: svc-tkg-domain-c#])).
YYYY-MM-DDTHH:MM:SSZ info schedext Creating RoleBinding for Group requested by sso:wcp-#####-#####-#####-#####@vsphere.local was denied/etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on one or more Supervisor virtual machines is emptykubectl get replicaset -n svc-tkg-domain-c#
kubectl describe replicaset -n svc-tkg-domain-c# <latest replicaset name>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate. 34s replicaset-controller Error creating: admission webhook “admission.vmware.com” denied the request: Cannot add toleration { key:node-role.kubernetes.io/control-plane, effect:NoSchedule } for master taint .vCenter 8.0 U3b and higher
In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed.
This issue is resolved in vCenter 8.0u3E and higher.
See How to SSH into Supervisor Control Plane VMs in Troubleshooting vSphere Supervisor Control Plane VMs
grep MACHINE_ID /var/lib/node.cfg
grep SSO_DOMAIN /var/lib/node.cfg
cat /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.
kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL
crictl ps | grep schedext
b93dfeb4bf980 ed05c0dd2aa27 9 minutes ago Running wcp-schedext 10 5117c174597af kube-scheduler-<UUID>
crictl stop b93dfeb4bf980
b93dfeb4bf980
kubectl get deployments -Akubectl rollout restart deployment -n <namespace> <deployment name>
For any system packageInstalls (PKGI) managing these deployments, it can take up to 10 - 15 minutes to reach Reconcile Success state.
kubectl get pkgi -A