After the abrupt disruption* of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards the following symptoms are presented.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate. 34s replicaset-controller Error creating: admission webhook “admission.vmware.com” denied the request: Cannot add toleration { key:node-role.kubernetes.io/control-plane, effect:NoSchedule } for master taint .
*This could be a power-outage, storage event(extreme latency or failed i/o), or in rare cases a reboot.
vCenter 8.0 U3b and onward
In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed.
Will be resolved in a future version of vSphere Supervisor
Add the following content to the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on each Supervisor Control Plane node:
cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist# List of user-prefixes whitelisted by schedext admission controller for# creating or updating resources modifying secure annotations or tolerating# master/control plane taint.
kubernetes-adminkubeadmsystem:sso:wcp-<machine_id>@<sso_domain>vmware-system-EOL
-Replace <machine_id> with the machine ID of the vCenter. This should be a UUID.
Run the following command on the Supervisor VM to gather MACHINE_ID:
grep MACHINE_ID /var/lib/node.cfg
Replace <sso_domain> with the domain being used by VC's SSO (such as vsphere.local).
To gather this grep the same file.
grep SSO_DOMAIN /var/lib/node.cfg
Restart wcp-schedext pod running on that VM after the file has been updated:
root@.... [ ~ ]# crictl ps -a | grep schedextb93dfeb4bf980 ed05c0dd2aa27 9 minutes ago Running wcp-schedext 10 5117c174597af kube-scheduler-<UUID>root@<UUID> [ ~ ]# crictl stop b93dfeb4bf980b93dfeb4bf980
Wait 10-15 minutes after the file has been updated and wcp-schedext pod restarted on all 3 Supervisor Control Plane VMs. The Guest Cluster(TKG/VKS) components should reconcile and return to a healthy state.