Namespaces got stuck in configuring state after vCenter Server is upgraded to 8.0 Update 3b and above
search cancel

Namespaces got stuck in configuring state after vCenter Server is upgraded to 8.0 Update 3b and above

book

Article ID: 381404

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Error message in GUI on namespace:

Failed to create RoleBinding for xxx in namespace xxxxxxx. API server returned error 'admission webhook "admission.vmware.com" denied the request: Users are allowed to create role bindings only for service accounts.'. This operation will be retried.

  • In vCenter server wcpsvc logs, below errors are noticed.

/var/log/vmware/wcp/wcpsvc.log

[YYYY-MM-DDTHH:MM:SS] debug wcp [workload/controller.go:906] [opID=svc-velero-domain-*****-workload=svc-velero-domain-*****] Reconcile role bindings done map[] [{Severity:ERROR Details:0xc025cb90e0}]
[YYYY-MM-DDTHH:MM:SS] debug wcp [workload/controller.go:906] [opID=svc-velero-domain-*****-workload=svc-velero-domain-*****] Reconcile role bindings done map[] [{Severity:ERROR Details:0xc025e29db0}]
[YYYY-MM-DDTHH:MM:SS] debug wcp [workload/controller.go:906] [opID=svc-velero-domain-*****-workload=svc-velero-domain-*****] Reconcile role bindings done map[] [{Severity:ERROR Details:0xc025edd090}]

  • When attempting to list objects in a guest cluster, error below is prompted.
Error from server (Forbidden): pods is forbidden: User "sso:[email protected]" cannot list resource "pods" in API group "" at the cluster scope
  • On one or more Supervisor control planes the file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist is empty

  • This issue prevents rollout of TKG components.

Environment

 vCenter Server 8.x

Cause

Known issue causes this file "/etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist" to get truncated.

Resolution

Fix:
 
Issue is fixed in 8.0U3e.
 

Workaround:

For each Supervisor Control Plane with an empty wcp-schedext-admission-controller-user-whitelist file, perform the following steps:

  1. Retrieve the required values:

    • Get the <machine_id> from the output of:
       
      grep MACHINE_ID /var/lib/node.cfg
       
    • Get the <sso_domain> from the output of:
       
      grep SSO_DOMAIN /var/lib/node.cfg
  2. Add the following content to the file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist:

     

    # List of user-prefixes whitelisted by schedext admission controller for
    # creating or updating resources modifying secure annotations or tolerating
    # master/control plane taint.

    kubernetes-admin
    kubeadm
    system:
    sso:wcp-<machine_id>@<sso_domain>
    vmware-system-

  3. Restart the wcp-schedext pod on the Supervisor Control Plane:

    • First, locate the pod:
       
      root@******************* [ ~ ]# crictl ps -a | grep schedext
      b93dfeb4bf980       ed05c0dd2aa27       9 minutes ago  
       
    • Then, stop the pod:

      root@******************* [ ~ ]# crictl stop b93dfeb4bf980
      b93dfeb4bf980

    • The pod should auto-start.
  4. Wait 10-15 minutes for TKG components to reconcile and return to a healthy state.

Additional Information

If you're not able to modify this file on the go using vi editor (vi /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist) you may run the following to update the file content (the command assumes that we are already in the /etc/vmware/wcp/ directory):

cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.

kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL