vSphere Kubernetes Cluster Kubectl Commands return "data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request"

Products

VMware vSphere Kubernetes Service vSphere with Tanzu

Issue/Introduction

From within the affected vSphere Kubernetes cluster context, kubectl commands work but return multiple error messages similar to the following:

unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request

couldn't get resource list for data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request

The deployment and deletion of packages or namespaces appear to be getting stuck.

Describing the affected package or namespace shows an error message similar to the following:

DiscoveryFailed  Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request

While connected to the Supervisor cluster context, the following symptoms are present:

For non-classy clusters, the affected vSphere Kubernetes cluster's TKC object shows Ready False state:

kubectl get tkc -n <affected cluster namespace>

NAMESPACE      NAME           CONTROL PLANE        WORKER     READY
my-namespace   my-cluster        #                   #        False

While connected to the affected vSphere Kubernetes cluster context, the following symptoms are present:

Packages installed (pkgi) on the affected vSphere Kubernetes cluster show Reconcile Failed state (the below antrea pkgi is an example and may vary in your environment):

kubectl get pkgi -A

NAMESPACE           NAME                    PACKAGE NAME              DESCRIPTION
vmware-system-tkg   my-cluster-antrea       antrea.tanzu.vmware.com   Reconcile failed: the server is currently unable to handle the request

Describing the pkgi in Reconcile failed state shows a similar error message to the below:

kubectl describe pkgi -n <pkgi namespace> <pkgi name>

Useful Error Message: Reconcile failed: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev)

The kapp-controller pod logs show a similar error message to the following (the below v1alpha1.Package is an example and may vary in your environment):

```
kubectl get pods -A | grep kapp
```

kubectl logs -n <kapp-controller namespace> <kapp-controller-pod>

Failed to watch *v1alpha1.Package: failed to list *v1alpha1.Package: the server is currently unable to handle the request (get packages.data.packaging.carvel.dev)

The kapp-controller apiservice shows that it is Available and in True Status:

kubectl describe apiservice v1alpha1.data.packaging.carvel.dev

Status:
  Conditions:
     Last Transition Time: YYYY-MM-DDTHH:MM:SSZ
     Message: all checks passed
     Reason: Passed
     Status: True
     Type: Available

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

This can occur on a vSphere Kubernetes cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)

Cause

The Kapp-controller pod is responsible for management and reconciling of installed packages in a cluster. This includes deployment and deletion of packages as well as namespaces associated with those packages.

If there is an issue with the kapp-controller pod within the vSphere Kubernetes cluster, it will not be able to properly reconcile any installed packages and associated namespaces.

In this scenario, existing packages are unaffected and the corresponding pods deployed by the package will remain healthy in Running state. This is because it is only reconciliation that is failing to be performed by the kapp-controller. In other words, kapp-controller failing to perform a heartbeat or health check on the packages.

This is not an indication of an issue with the package itself.

There may have been a disconnection or issue with the carvel apiservice within the vSphere Kubernetes cluster causing the kapp-controller to fail to communicate with the apiservice. The apiservice may have recovered on its own but the kapp-controller is currently unable to communicate with the apiservice.

Resolution

The kapp-controller pod will need to be restarted to sync with the healthy carvel apiservice in the affected vSphere Kubernetes cluster.

Note: This KB is only applicable if the packaging.carvel.dev apiservice is healthy in the affected vSphere Kubernetes cluster. If this apiservice is unavailable, the apiservice will need to be investigated and restored to Available state before performing the following steps.

Connect into the affected cluster's context
Confirm on the status of the kapp-controller pod and deployment:
- ```
kubectl get pod,deploy -A | grep kapp
```

Perform a rolling restart on the kapp-controller deployment:

kubectl rollout restart deploy kapp-controller -n <kapp controller namespace>

Confirm that the kapp-controller pod was restarted successfully:
- ```
kubectl get pods -A | grep kapp
```
It may take a minute for the kapp-controller pod to start up, stabilize and properly reconcile the installed packages in the cluster.

Check that the pkgi in the cluster now show Reconcile succeeded state (the below antrea pkgi is an example and may vary in your environment):

kubectl get pkgi -A


NAMESPACE            NAME                 PACKAGE NAME                 DESCRIPTION
vmware-system-tkg    my-cluster-antrea    antrea.tanzu.vmware.com      Reconcile succeeded