PackageInstall (pkgi) Fails with “Unable to Retrieve Server APIs” Due to Expired Control Plane Certificates
search cancel

PackageInstall (pkgi) Fails with “Unable to Retrieve Server APIs” Due to Expired Control Plane Certificates

book

Article ID: 399977

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management VMware Cloud Director

Issue/Introduction

Customers may encounter failed packageinstalls (pkgi) or cluster component failures during compatibility updates or upgrade preparation workflows for VMware Cloud Director Container Service Extension. The error observed typically occurs during API discovery and appears as:

“unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request”

In affected clusters, associated logs may also show client-side throttling messages and delays when attempting to contact the Kubernetes API server. The failure affects functionality such as kapp-controller reconciliation and prevents scripts or automated tools from successfully completing.

Within the associated cluster you will see:

# kubect get pkgi -A

# kubectl describe pkgi <pkgi-name> -n <pkgi-namesapce>

Environment

  • VMware Cloud Director
  • VMware Tanzu Kubernetes Grid
  • VMware Cloud Director Container Service Extension

Cause

The underlying cause is expired TLS certificates on long-running control plane components, including kube-apiserver, etcd, kube-controller-manager, and kube-scheduler. In environments where restarts of these components did not occur following certificate rotation (typically at the 365-day mark), some pods may continue running with expired certificates.

This results in partial cluster failure:

  • Some components communicate using valid rotated certificates.
  • Others fail due to invalid or expired certs, leading to reconcile failures, API discovery timeouts, and degraded controller behavior.

This is most clearly observed by checking pod age in the kube-system namespace. If one or more control plane pods are significantly older than the cert rotation window (e.g., 384+ days vs. others at 19 days), it indicates those pods did not restart during rotation.

The kube-apiserver, etcd, kube-controller-manager, and kube-scheduler. pods are showing "Age" date older than the date when the cluster Kubernetes component certificates  were rotated:

# kubectl get pods -A 

Resolution

  1. Check the age of control plane component pods:
    • Look in the kube-system namespace for:
      • kube-apiserver
      • etcd
      • kube-controller-manager
      • kube-scheduler
    • If any pods are older than 365 days and others are newer, certificate expiration is likely the issue.
  2. Check the certificate expiration by logging into the cluster's control plane node and issuing the command:
    • kubeadm certs check-expiration
      • If certificates have been rotated, proceed to step 3. Otherwise follow the certificate rotation process.
  3. Restart stale control plane pods:
    • Safely restart the outdated pods in the kube-system namespace to ensure they pick up the latest valid certificates.
  4. Restart the kapp-controller:
    • This ensures it reloads the new certs and can resume reconciliation.
  5. Recheck the state of pkgi resources:
    • Previously failed package installs (e.g., antrea.tanzu.vmware.com) should now reconcile successfully.
  6. Retry any previously failed automation scripts or upgrades.

Additional Information