Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'
search cancel

Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'

book

Article ID: 382124

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime vSphere with Tanzu VMware vSphere with Tanzu

Issue/Introduction

After the abrupt disruption* of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards the following symptoms are presented.

  • Components such as the TKG webhook repeatedly report the following condition:
    • message: 'admission webhook "admission.vmware.com" denied the request:
      Cannot add toleration {key:node-role.kubernetes.io/master, effect:NoSchedule }

  • The Supervisor and workload clusters show in a configuring state

  • Receive an error on SSO authentication when attempting to access workload clusters

  • The wcp-schedext pod on the supervisor has the following errors
    • 2024-10-22T02:31:51.313Z info schedext Creating RoleBinding for Group requested by sso:[email protected] was denied

  • The file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on one or more Supervisor virtual machines is empty

 

*This could be a power-outage, storage event(extreme latency or failed i/o), or in rare cases a reboot.

Environment

vCenter 8.0 U3b and onward

Cause

In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed. 

Resolution

Will be resolved in a future version of vSphere Supervisor

Workaround:

Add the following content to the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on each Supervisor Control Plane node:

 

 

cat <<EOL >> /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.

kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL

 

 

-Replace <machine_id> with the machine ID of the vCenter. This should be a UUID.
Run the following command on the Supervisor VM to gather MACHINE_ID:

grep MACHINE_ID /var/lib/node.cfg

Replace <sso_domain> with the domain being used by VC's SSO (such as vsphere.local).
To gather this grep the same file.

grep SSO_DOMAIN /var/lib/node.cfg

Restart wcp-schedext pod running on that VM after the file has been updated:

root@.... [ ~ ]# crictl ps -a | grep schedext
b93dfeb4bf980       ed05c0dd2aa27       9 minutes ago        Running             wcp-schedext                 10                  5117c174597af       kube-scheduler-<UUID>
root@<UUID> [ ~ ]# crictl stop b93dfeb4bf980
b93dfeb4bf980

Wait 10-15 minutes after the file has been updated and wcp-schedext pod restarted on all 3 Supervisor Control Plane VMs. The Guest Cluster(TKG/VKS) components should reconcile and return to a healthy state.