After the abrupt disruption* of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards the following symptoms are presented.
*This could be a power-outage, storage event(extreme latency or failed i/o), or in rare cases a reboot.
vCenter 8.0 U3b and onward
In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed.
Will be resolved in a future version of vSphere Supervisor
Add the following content to the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on each Supervisor Control Plane node:
cat <<EOL >> /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.
kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL
-Replace <machine_id> with the machine ID of the vCenter. This should be a UUID.
Run the following command on the Supervisor VM to gather MACHINE_ID:
grep MACHINE_ID /var/lib/node.cfg
Replace <sso_domain> with the domain being used by VC's SSO (such as vsphere.local).
To gather this grep the same file.
grep SSO_DOMAIN /var/lib/node.cfg
Restart wcp-schedext pod running on that VM after the file has been updated:
root@.... [ ~ ]# crictl ps -a | grep schedext
b93dfeb4bf980 ed05c0dd2aa27 9 minutes ago Running wcp-schedext 10 5117c174597af kube-scheduler-<UUID>
root@<UUID> [ ~ ]# crictl stop b93dfeb4bf980
b93dfeb4bf980
Wait 10-15 minutes after the file has been updated and wcp-schedext pod restarted on all 3 Supervisor Control Plane VMs. The Guest Cluster(TKG/VKS) components should reconcile and return to a healthy state.