Reconcile Failed: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev) - Velero and TKG Supervisor services are stuck in Configuring state

search cancel

Reconcile Failed: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev) - Velero and TKG Supervisor services are stuck in Configuring state

book

Article ID: 406419

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

From vCenter UI, Supervisor Cluster is stuck at "Configuring Error state" as below -

Configured Core Supervisor Services

Service: velero.vsphere.vmware.com. Reason: ReconcileFailed.
Message: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev).
Service: tkg.vsphere.vmware.com. Reason: ReconcileFailed. Message: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev).

Observed this issue after updating the expired admin.conf certificates
Pods are in running state
Certificates are renewed and valid

Packages are in 'Reconcile Failed' state

# kubectl get pkgi -A
NAMESPACE     NAME                      PACKAGE NAME                     PACKAGE VERSION               DESCRIPTION     AGE
<package-ns>  ########-kapp-controller  kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware  Reconcile failed: the server is currently unable to handle the request (get pack...   30    0d
<package-ns>  ########-kapp-controller  kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware  Reconcile failed: the server is currently unable to handle the request (get pack...   35    6d
<package-ns>  ########-kapp-controller  kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware  Reconcile failed: the server is currently unable to handle the request (get pack...   35    4d

Environment

vSphere with Tanzu 7.x
vSphere with Tanzu 8.x
vCenter 7.x
vCenter 8.x

Cause

The reconcile process failed as some containers did not restart following the certificate renewal.

Resolution

Check all Containers are restarted after admin.conf certificates renewal

# crictl ps | egrep "CONTAINER|sched|kube-controller|apiserver|etcd"
CONTAINER      IMAGE          CREATED       STATE    NAME            ATTEMPT  POD ID         POD
39749a10f690b  047b0f967c8d2  26 hours ago  Running  kube-apiserver  17       79f726e01a103  kube-apiserver-##################
1487a6a57b2ec  eb197a7cd3e35  26 hours ago  Running  etcd            1        e5803d1f07d12  etcd-##################
d6811f2dea26a  b12c835e886e1  26 hours ago  Running  kube-scheduler  4        37727a51d022e  kube-scheduler-##################
32243a3f0ef81  8f4c3553c40a1  26 hours ago  Running  kube-controller-manager  ea8e6788fab4f  kube-controller-manager-##################
9b4fbe961dd11  b2810e162a278  3 days ago    Running  wcp-schedext    6        37727a51d022e  kube-scheduler-##################

If any above containers are not restarted. Restart the containers

```
crictl stop <Container ID>
```
Verify if both Core Supervisor services are in configuring/configured state

Check kube-api-server logs:

# k logs -n kube-system kube-apiserver-##################
:
E0805 04:12:25.060384       1 controller.go:102] loading OpenAPI spec for "v1alpha1.data.packaging.carvel.dev" failed with: failed to download v1alpha1.data.packaging.carvel.dev: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time

Check the packages are able to Reconcile

Errors might still appear if the configuration remains problematic -

Reason: ReconcileFailed. Message: kapp: Error: unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/vlalpha1: stale GroupVersion discovery: data.packaging.carvel.dev/vlalpha1 (possibly related issue: https://github.com/vmware-tanzu/carvel-kapp/issues/12).

Initiate a rollout restart the kapp-controller deployment (note: check apiservice certs before kapp-controller deployment is restarted):
```
kubectl rollout restart deployment -n <namespace> kapp-controller
```
Restart the pods - vmware-system-appplatform-lifecycle-mgr-#### and kube-apiserver

Review the vCenter WCP logs for any errors. You may come across entries similar to the ones below in the wcpsvc log.

YYYY-MM-DDTHH:MM:SS.612Z warning wcp [kubelib/retry.go:93] [opID=SupervisorServiceController] Request to apiserver failed Err <nil>, Endpoint http://localhost:1080/external-cert/http1/<VIP IP>/6443/apis/data.packaging.carvel.dev/v1alpha1/namespaces/vmware-system-supervisor-services/packages/velero.vsphere.vmware.com.1.6.1-embedded+23741747?timeout=2m0s. Will be retried.

If errors related to packages are observed, you may restart the WCP service on the vCenter
- service-control --restart wcp

Feedback

thumb_up Yes

thumb_down No