Supervisor Service Upgrade Failed due to Fetch Image Failures - Error: Syncing directory with imgpkgbundle contents - MANIFEST_UNKNOWN
search cancel

Supervisor Service Upgrade Failed due to Fetch Image Failures - Error: Syncing directory with imgpkgbundle contents - MANIFEST_UNKNOWN

book

Article ID: 403681

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When attempting to upgrade a vSphere Supervisor service, the service's configuration status reaches an error state with a message indicating failure to pull an image from the container registry listed in that service version's Package manifest.

This could be an upgrade to VKS service or TKG service, or any of the available Supervisor Services.

The error message will be similar to one of the following errors, where values in brackets <> will vary by environment and Supervisor Service failure:

Configured Core Supervisor Service

service: <supervisor service>. Reason: ReconcileFailed. Message: kapp: error waiting on reconcile packageinstall/<pkgi which will vary based on the supervisor service> (packing.carvel.dev/v#alpha#)
namespace: svc-tkg-domain-c<id>: Finished unsuccessfully (Reconcile Failed: (message vendir: Error: Syncing directory '0': Syncing Directory '.' with imgpkgBundle contents: Fetching image: Get https://localhost:5000/v2/<container.registry.hostname>/manifest/sha256:<package information>: MANIFEST_UNKNOWN: manifest unknown; map[]

Reason: ReconcileFailed. Message: vendir: Error: Syncing directory '0': Syncing directory '.' with imgpkgBundle contents: Fetching image: GET https://<container.registry.hostname>/v#/vsphere/supervisor/packages/YYYY.MM.DD/vks-standard-packages/manifests/fake: MANIFEST_UNKNOWN: The named manifest is not known to the registry.; map[manifest:vsphere/supervisor/packages/YYYY.MM.DD/vks-standard-packages] ."

Core Supervisor Services are: VKS service, TKG service or Velero Operator. The VKS service or TKG service are responsible for workload cluster management.

Environment

vCenter 8.0u3 and higher

vSphere Kubernetes Service (VKS) v3.0.0 and higher

Cause

Failure messages concerning image resolution and image fetch operations can appear in the service's configuration status for a number of reasons. Some possible reasons:

  • A private container image registry was used but images were not relocated there prior to service install or upgrade.
  • A private container image registry was used but is not reachable from the Supervisor's management network.
  • A private container image registry using self-signed certificates for TLS is not trusted by the Supervisor.
  • A private container image registry requiring authentication for image pulls was used to host the images but Supervisor has not been configured with authentication credentials.
  • The Supervisor is configured with a proxy server that cannot reach the container registry hosting the images. 
  • An intranet in the environment has a firewall or proxy server intercepting TLS requests and substituting its own certificates which Supervisor is not configured to trust.
  • The Supervisor service YAML was not updated properly to point to pull its image from the private container image registry.

Note: Supervisor inherits vCenter Service Appliance proxy settings by default and will attempt to use it unless configured otherwise.

See Configure the Supervisor to Use a Proxy for details

Resolution

  1. Check the Cause section above to see if any of the potential misconfiguration or networking issues occurred in your environment and correct it accordingly.

  2. Contact VMware by Broadcom Support to fully scope the issue and see if a revert is safe or not.
    • In some cases the revert process will cause more harm to the supervisor service than good and add significant complexities to the overall repair process. 

 

IMPORTANT: Do not perform kubectl deletions of Supervisor Service package installs (pkgi).

Manual deletion through kubectl will result in downtime and potentially irrecoverable state of the environment.

The Core Supervisor Services under the namespace vmware-system-supervisor-services are critical for the environment to function and manage workload clusters.

Core Supervisor Services are: VKS service, TKG service or Velero Operator. The VKS service or TKG service are responsible for workload cluster management.

Deletion of the TKG or VKS supervisor service pkgi will lead to the deletion of all workload clusters in the environment.

 

Management of Supervisor Services should be performed from Workload Management in the vSphere web UI and will not allow Core Supervisor Services to be deleted.

Additional Information

vSphere Supervisor Services Github

Airgapped vSphere Supervisor Guide on Github

--------------------

  lastAttemptedVersion: 1.9.3+vmware.0
  observedGeneration: 3
  usefulErrorMessage: |-
    Stopped installing matched version '1.7.4+vmware.0' since last attempted version '1.9.3+vmware.0' is higher.
    hint: Add annotation packaging.carvel.dev/downgradable: "" to PackageInstall to proceed with downgrade
  version: 1.7.4+vmware.0

This procedure is intended only to revert a failed Supervisor service upgrade. It should not be used as a general-purpose mechanism to downgrade services. Service downgrade is intentionally not permitted by vCenter and for some services it may render instances of their managed resources orphaned. This risk doesn't apply if the current (failed) version was never successfully upgraded and rolled out.