Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'

search cancel

Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'

book

Article ID: 382124

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime vSphere with Tanzu VMware vSphere Kubernetes Service

Issue/Introduction

After the abrupt disruption* of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards the following symptoms are presented.

Components such as the TKG webhook repeatedly report the following condition:
- message: 'admission webhook "admission.vmware.com" denied the request:
  Cannot add toleration {key:node-role.kubernetes.io/master, effect:NoSchedule }
The Supervisor and workload clusters show in a configuring state
Receive an error on SSO authentication when attempting to access workload clusters
The wcp-schedext pod on the supervisor has the following errors
- 2024-10-22T02:31:51.313Z info schedext Creating RoleBinding for Group requested by sso:[email protected] was denied
The file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on one or more Supervisor virtual machines is empty

The tanzu-capabilities packages will be stuck in reconcile failed state. The describe output of the replicaset shows the following event.

Events:
  Type           Reason                 Age      From                             Message
  ----             ------                   ----      ----                               -------
  Warning      FailedCreate.      34s      replicaset-controller    Error creating: admission webhook “admission.vmware.com” denied the request: Cannot add toleration { key:node-role.kubernetes.io/control-plane, effect:NoSchedule } for master taint .

*This could be a power-outage, storage event(extreme latency or failed i/o), or in rare cases a reboot.

Environment

vCenter 8.0 U3b and onward

Cause

In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed.

Resolution

Will be resolved in a future version of vSphere Supervisor

Workaround:

Add the following content to the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on each Supervisor Control Plane node:

cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.

kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL

-Replace <machine_id> with the machine ID of the vCenter. This should be a UUID.
Run the following command on the Supervisor VM to gather MACHINE_ID:

grep MACHINE_ID /var/lib/node.cfg

Replace <sso_domain> with the domain being used by VC's SSO (such as vsphere.local).
To gather this grep the same file.

grep SSO_DOMAIN /var/lib/node.cfg

Restart wcp-schedext pod running on that VM after the file has been updated:

root@.... [ ~ ]# crictl ps -a | grep schedext
b93dfeb4bf980 ed05c0dd2aa27 9 minutes ago Running wcp-schedext 10 5117c174597af kube-scheduler-<UUID>
root@<UUID> [ ~ ]# crictl stop b93dfeb4bf980
b93dfeb4bf980

Wait 10-15 minutes after the file has been updated and wcp-schedext pod restarted on all 3 Supervisor Control Plane VMs. The Guest Cluster(TKG/VKS) components should reconcile and return to a healthy state.

Feedback

thumb_up Yes

thumb_down No