vSphere pods stuck at PodVMAnnotationsMissing or PodVMCreationFailed state in vSphere 8.0u3b and above environment

search cancel

book

calendar_today

VMware vSphere Kubernetes Service

Deploy vSphere pods stuck at PodVMAnnotationsMissing or PodVMCreationFailed state.
kube-scheduler log has admission webhook error for annotate vm uuid:

Failed to add podVM's annotations to the pod <pod-name> in namespace <namespace name>.
Error: admission webhook "admission.vmware.com" denied the request: Cannot change VMware system annotation 'vmware-system-vm-uuid'. Will retry
wcp-schedext on particular Supervisor node shows could not login vCenter:

Could not login to vCenter. Error: ServerFaultCode: The object 'vim.VirtualMachine:<vm-xxxx>' has already been deleted or has not been completely created
The file /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on that Supervisor is empty.

VMware vSphere with Tanzu 8.0 U3b and above

When a fault occurs such as a power off or IO write failure, the file could be truncated.

Workaround:

Add the following content to the /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist on every impacted Supervisor Control Plane node:

cat <<EOL > /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist
# List of user-prefixes whitelisted by schedext admission controller for
# creating or updating resources modifying secure annotations or tolerating
# master/control plane taint.

kubernetes-admin
kubeadm
system:
sso:wcp-<machine_id>@<sso_domain>
vmware-system-
EOL
Replace <machine_id> with the machine ID of the vCenter. This should be a UUID.
Run the following command on the Supervisor VM to gather MACHINE_ID:
grep MACHINE_ID /var/lib/node.cfg

Replace <sso_domain> with the domain being used by VC's SSO (such as vsphere.local).
To gather this grep the same file.

grep SSO_DOMAIN /var/lib/node.cfg
Restart wcp-schedext pod running on that Supervisor Control Plane node after the file has been updated:

root [ ~ ]# crictl ps -a | grep schedext
<Pod PID> 9 minutes ago Running wcp-schedext 10 kube-scheduler-<UUID>
root [ ~ ]# crictl stop <Pod PID>
<Pod PID>
Wait for a few minutes and observe the following message appear in wcp-schedext log:

root [ ~ ]# crictl logs <Pod PID> | grep vCenter Successfully connected to vCenter https://<VC-FQDN>:443/sdk
Then redeploy the vSphere pods should be successful.

thumb_up Yes

thumb_down No