Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'
search cancel

Mitigate error condition 'admission webhook "admission.vmware.com" denied the request'

book

Article ID: 382124

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime vSphere with Tanzu VMware vSphere Kubernetes Service

Issue/Introduction

After the abrupt disruption of either vCenter or the Supervisor Control Plane VM's on version 8.0U3b and onwards, one or more of the following symptoms are presented.

  • Components such as the TKG webhook repeatedly report the following condition:
    • message: 'admission webhook "admission.vmware.com" denied the request:
      Cannot add toleration {key:node-role.kubernetes.io/master, effect:NoSchedule }

  • The Supervisor and workload clusters show in a Configuring state
    • When checking the Supervisor Cluster's configure state further, one or more errors similar to the following are present:
      The noted deployments and packageinstalls will vary depending on what system components were affected by this KB's issue.
      Service: velero.vsphere.vmware.com. Reason: ReconcileFailed. Message: kapp: Error: Timed out waiting after 15m0s for resources: [deployment/velero-vsphere-operator-webhook (apps/v1) namespace: svc-velero-domain-c#].
      
      Service: tkg.vsphere.vmware.com. Reason: ReconcileFailed. Message: kapp: Error: waiting on reconcile packageinstall/tanzu-cluster-api-control-plane-kubeadm (packaging.carvel.dev/v1alpha1) namespace: svc-tkg-domain-c#: Finished unsuccessfully (Reconcile failed: (message: kapp: Error: Timed out waiting after 15m0s for resources: [deployment/capi-kubeadm-control-plane-controller-manager (apps/v1) namespace: svc-tkg-domain-c#])).

       

  • One or more Supervisor cluster VMs encountered root disk space or DiskPressure recently

  • Receive an error on SSO authentication when attempting to access workload clusters and the respective namespace shows a RoleBinding creation error:

  • The wcp-schedext pod on the supervisor has the following errors
    • YYYY-MM-DDTHH:MM:SSZ info schedext Creating RoleBinding for Group requested by sso:wcp-#####-#####-#####-#####@vsphere.local was denied

  • The file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on one or more Supervisor virtual machines is empty

  • For any packageInstalls (PKGI) stuck in ReconcileFailed state. The describe output of the latest replicaset shows the following recent event:
    kubectl get replicaset -n svc-tkg-domain-c#
    
    kubectl describe replicaset -n svc-tkg-domain-c# <latest replicaset name>
    
    Events:
      Type           Reason                 Age      From                             Message
      ----             ------                   ----      ----                               -------
      Warning      FailedCreate.      34s      replicaset-controller    Error creating: admission webhook “admission.vmware.com” denied the request: Cannot add toleration { key:node-role.kubernetes.io/control-plane, effect:NoSchedule } for master taint .

Environment

vCenter 8.0 U3b and higher

Cause

In rare cases of abrupt power outages or storage failures, /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist sync fails causing the file to get truncated instead of refreshed. 

Resolution

Resolution:

This issue is resolved in vCenter 8.0u3E and higher.

 

Workaround:

  1. SSH to each Supervisor Control Plane VM that has encountered this issue:
  2. Retrieve the MACHINE_ID from the node configuration file:
    grep MACHINE_ID /var/lib/node.cfg

     

  3. Note down the SSO_DOMAIN:
    grep SSO_DOMAIN /var/lib/node.cfg

     

  4. Confirm that the whitelist file is empty:
    cat /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist

     

  5. For each Supervisor Control Plane VM that has an empty whitelist file, populate it with the following contents using the information from the above steps, replacing <machine_id> and <sso_domain> with the appropriate values respectively:
    cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
    # List of user-prefixes whitelisted by schedext admission controller for
    # creating or updating resources modifying secure annotations or tolerating
    # master/control plane taint.
    
    kubernetes-admin
    kubeadm
    system:
    sso:wcp-<machine_id>@<sso_domain>
    vmware-system-
    EOL

     

  6. Restart the wcp-schedext container on each affected Supervisor Control Plane VM that had an empty whitelist file. For example:
    crictl ps  | grep schedext
    b93dfeb4bf980       ed05c0dd2aa27       9 minutes ago        Running             wcp-schedext                 10                  5117c174597af       kube-scheduler-<UUID>
    
    crictl stop b93dfeb4bf980
    b93dfeb4bf980

     

  7. Check on the status of the system deployments in the Supervisor cluster:
    kubectl get deployments -A


  8. Perform a restart on each deployment that has fewer running replicas than expected:
    kubectl rollout restart deployment -n <namespace> <deployment name>

     

  9. For any system packageInstalls (PKGI) managing these deployments, it can take up to 10 - 15 minutes to reach Reconcile Success state.

    kubectl get pkgi -A

     

  10. Once all system PKGI are ReconcileSucceeded, the Supervisor cluster should reach Running state if there are no other environmental issues.

Additional Information

Related KB: Namespaces got stuck in configuring state after vCenter Server is upgraded to 8.0 Update 3b and above