Reconcile Failed: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev) - Velero and TKG Supervisor services are stuck in Configuring state
search cancel

Reconcile Failed: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev) - Velero and TKG Supervisor services are stuck in Configuring state

book

Article ID: 406419

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • From vCenter UI, Supervisor Cluster is stuck at "Configuring Error state" as below  - 
    Configured Core Supervisor Services

    Service: velero.vsphere.vmware.com. Reason: ReconcileFailed.
    Message: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev).
    Service: tkg.vsphere.vmware.com. Reason: ReconcileFailed. Message: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev).
  • Observed this issue after updating the expired admin.conf certificates
  • Pods are in running state
  • Certificates are renewed and valid
  • Packages are in 'Reconcile Failed' state
    # kubectl get pkgi -A
    NAMESPACE  NAME    PACKAGE NAME   PACKAGE VERSION DESCRIPTION     AGE
    <package-ns>  ########-kapp-controller  kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware  Reconcile failed: the server is currently unable to handle the request (get pack...   30    0d
    <package-ns> ########-kapp-controller  kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware Reconcile failed: the server is currently unable to handle the request (get pack...   35    6d
    <package-ns>  ########-kapp-controller kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware Reconcile failed: the server is currently unable to handle the request (get pack...   35    4d

Environment

vSphere with Tanzu 7.x
vSphere with Tanzu 8.x
vCenter 7.x
vCenter 8.x

Cause

The reconcile process failed as some containers did not restart following the certificate renewal.

Resolution

  • Check all Containers are restarted after admin.conf certificates renewal
    # crictl ps | egrep "CONTAINER|sched|kube-controller|apiserver|etcd"
    CONTAINER   IMAGE     CREATED   STATE   NAME         ATTEMPT  POD ID POD
    39749a10f690b  047b0f967c8d2  26 hours ago  Running kube-apiserver 17       79f726e01a103  kube-apiserver-##################
    1487a6a57b2ec eb197a7cd3e35  26 hours ago  Running etcd            1        e5803d1f07d12 etcd-##################
    d6811f2dea26a b12c835e886e1  26 hours ago  Running kube-scheduler  4        37727a51d022e  kube-scheduler-##################
    32243a3f0ef81 8f4c3553c40a1 26 hours ago  Running  kube-controller-manager ea8e6788fab4f  kube-controller-manager-##################
    9b4fbe961dd11 b2810e162a278  3 days ago   Running wcp-schedext   6        37727a51d022e  kube-scheduler-##################
    If any above containers are not restarted. Restart the containers
  • crictl stop <Container ID>
  • Verify if both Core Supervisor services are in configuring/configured state
  • Check kube-api-server logs:
    # k logs -n kube-system kube-apiserver-##################
    :
    E0805 04:12:25.060384       1 controller.go:102] loading OpenAPI spec for "v1alpha1.data.packaging.carvel.dev" failed with: failed to download v1alpha1.data.packaging.carvel.dev: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time
  • Check the packages are able to Reconcile
  • Errors might still appear if the configuration remains problematic -
    Reason: ReconcileFailed. Message: kapp: Error: unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/vlalpha1: stale GroupVersion discovery: data.packaging.carvel.dev/vlalpha1 (possibly related issue: https://github.com/vmware-tanzu/carvel-kapp/issues/12).
  • Initiate a rollout restart the kapp-controller deployment (note: check apiservice certs before kapp-controller deployment is restarted):
    kubectl rollout restart deployment -n <namespace> kapp-controller
    
  • Restart the pods -  vmware-system-appplatform-lifecycle-mgr-#### and kube-apiserver
  • Review the vCenter WCP logs for any errors. You may come across entries similar to the ones below in the wcpsvc log.
    YYYY-MM-DDTHH:MM:SS.612Z warning wcp [kubelib/retry.go:93] [opID=SupervisorServiceController] Request to apiserver failed Err <nil>, Endpoint http://localhost:1080/external-cert/http1/<VIP IP>/6443/apis/data.packaging.carvel.dev/v1alpha1/namespaces/vmware-system-supervisor-services/packages/velero.vsphere.vmware.com.1.6.1-embedded+23741747?timeout=2m0s. Will be retried.
  • If errors related to packages are observed, you may restart the WCP service on the vCenter
    • service-control --restart wcp