Unable to assign VM Class to a new namespace created
search cancel

Unable to assign VM Class to a new namespace created

book

Article ID: 374751

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Assigning a VM Class to a new namespace created completes but the namespace goes to 'Configuring' state.
  • Error seen is "Failed to create VirtualMachineClassBinding resource with name 'guaranteed-large' from workload <name>. This operation will be retried."
  • The observation is same for any VM Class selected for the namespace.
  • Able to list the vmclassbindings in the Supervisor ssh.
  • /var/log/vmware/wcp/wcpsvc.log contains the follow error:
    YYYY-MM-DDTHH:MM:SS.393Z error wcp [workload/controller.go:1748] [opID=tsam-itsec-workload=tsam-itsec] Error in creating the VirtualMachineClassBinding resource. err: Post "https://10.1.XXX.XX:6443/apis/vmoperator.vmware.com/v1alpha1/namespaces/tsam-itsec/virtualmachineclassbindings?timeout=2m0s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    
  • The api server audit logs contain entries similar to these messages:
    /var/log/vmware/audit/apiserver-audit.log
    kubernetes/$Format wcpsvc/22207534","objectRef":{"resource":"virtualmachineclassbindings","namespace":"tsam-itsec","name":"guaranteed-medium","apiGroup":"vmoperator.vmware.com","apiVersion":"v1alpha1"},"responseStatus":{"metadata":{},"status":"Failure","message":"Internal error occurred: failed calling webhook \"capi.validating.tanzukubernetescluster.run.tanzu.vmware.com\": failed to call webhook: Post \"https://vmware-system-tkg-webhook-service.vmware-system-tkg.svc:443/capi-validate?timeout=10s\": x509: certificate signed by unknown authority","reason":"InternalError","details":{"causes":[{"message":"failed calling webhook \"capi.validating.tanzukubernetescluster.run.tanzu.vmware.com\": failed to call webhook: Post \"https://vmware-system-tkg-webhook-service.vmware-system-tkg.svc:443/capi-validate?timeout=10s\": x509: certificate signed by unknown authority"}]},"code":500},"requestReceivedTimestamp":"YYYY-MM-DDTHH:MM:SS.788856Z","stageTimestamp":"YYYY-MM-DDTHH:MM:SS.792889Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"wcp:wcpsvc:cluster-admin\" of ClusterRole \"cluster-admin\" to User \"sso:wcp-<ID>@vsphere.local\""}}
  • The TKG logs for the webhook are having lot of bad cert errors:
    $ grep 'remote error: tls: bad certificate'  ../var/log/pods/vmware-system-tkg_vmware-system-tkg-webhook-695d456ddd-8zsvt_07e16361-57ec-4e34-8628-76a70a98be4e/manager/0.log | wc -l
    31319

Environment

VMware vSphere with Tanzu

Cause

VM Class Binding creation is failing because CAPI validating webhook rejecting the request due to cert signed by unknown authority error.
This can be caused due to human error with when it was manually edited and included the incorrect syntax under the annotation.

Resolution

  1. Check the latest issued certificate stored in secret "vmware-system-tkg-serving-cert":
    kubectl get secret -n vmware-system-tkg vmware-system-tkg-webhook-service-cert -o jsonpath='{.data.ca\.crt}{"\n"}'
  2. Check the certificate used by "vmware-system-tkg-validating-webhook-configuration":
    kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io vmware-system-tkg-validating-webhook-configuration -o jsonpath='{range .webhooks[*]}{.clientConfig.service.path}{"\t"}{.clientConfig.caBundle}{"\n"}{end}'
  3. If they differ, update "vmware-system-tkg-validating-webhook-configuration" certificates (all of them) with the one stored in "vmware-system-tkg-webhook-service-cert":
    kubectl edit validatingwebhookconfigurations.admissionregistration.k8s.io vmware-system-tkg-validating-webhook-configuration
  4. Update .clientConfig.caBundle for each of the webhook paths.