vSphere kubernetes service upgrade failed with "kapp: error waiting on reconcile packageinstall"
search cancel

vSphere kubernetes service upgrade failed with "kapp: error waiting on reconcile packageinstall"

book

Article ID: 431379

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • vSphere kubernetes service (VKS) upgraded from 3.5.0 to 3.6.0.
  • Image registry in VKS 3.6.0 yaml has been replaced with local harbor:

    -imgpkgBundle:
       image: <local-harbor-address>/vsphere/iaas/vsphere-kubernetes-service/3.6.0/vsphere-kubernetes-service:3.6.0
  • The status of VKS shows the error with below:

Reason: ReconcileFailed
Message: kapp: Error: waiting on reconcile packageinstall/runtime-extension (packaging.carvel.dev/v1alpha1) namespace: svc-tkg-domain-c##: Finished waiting unsuccessfully: Reconcile failed: message: kapp: Error: Timed out waiting after 15m0s for resources: [deployment/machine-agent-server (apps/v1) namespace: svc-tkg-domain-c## deployment/runtime-extension-controller-manager (apps/v1) namespace: svc-tkg-domain-c##.]

  • Many pkgi are ReconcileFailed in the namespace 'svc-tkg-domain-c##' with the similar error with above.

         kubectl get pkgi -n svc-tkg-domain-c##

  • Describe App of pkgi that indicates Pods are not allowed to run in this namespace. For example:

         kubectl describe app -n svc-tkg-domain-c## upgrade-compatibility-service

8:47:52AM: Deployment has encountered replica failure: FailedCreate, message: admission webhook "validate.cpvmopd.applplatform.vmware.com" denied the request: pod svc-tkg-domain-c##/upgrade-compatibility-service-######-###### in an untrusted namespace on the control plane VM is not allowed.

  • The logs of Pod applplatform indicates that package bundles are not trusted:

         kubectl logs -n vmware-system-applplatform-operator-system vmware-system-applplatform-operator-mgr-0

stderr F I0224 hh:mm:ss 1 first_party_trust.go:119] "msg"="Package bundle is not trusted" "namespace"="svc-tkg-domain-c##" "serviceID"="tkg" "verification"="tkg.3.6.0-signature-verification-#####" "version"="3.6.0+v1.35"

Environment

VMware vSphere Kubernetes Service

Cause

The package bundles of VKS 3.6.0 do not contain the certificate signature that causes they lost trusted in Supervisor. This is caused by the missing flag '--cosign-signatures' when copying the package bundles of VKS 3.6.0 with the tool imgpkg.

Resolution

  1. Download packages bundles again with tool imgpkg by adding the flag '--cosign-signatures'. See Generate the VKS Binary Package.

    For example:

    imgpkg copy -b projects.packages.broadcom.com/vsphere/iaas/vsphere-kubernetes-service/3.6.0/vsphere-kubernetes-service:3.6.0 --to-tar vks-v3.6.0.tar --cosign-signatures

  2. Upload the package bundles to local harbor registry again to override the existing package bundles of VKS 3.6.0. See Upload the VKS Binary to the Private Registry.
  3. Restart the Pod vmware-system-applplatform-operator-mgr-0

    kubectl rollout restart sts -n vmware-system-applplatform-operator-system vmware-system-applplatform-operator-mgr

  4. Confirm all packages are Reconciled successfully in the namespace svc-tkg-domain-c## after a while:

    kubectl get pkgi -n svc-tkg-domain-c##

    Monitor it for some minutes, ensure the status changed from ReconcileFailed to ReconcileSucceeded.