Updating Supervisor Cluster can stall due to stale content library being called in the upgrade script.
From 8.0 U2 there is a known issue with the upgrade script that is exposed when a Supervisor namespace references a content library that does not exist (e.g. because it was removed from VC).
The upgrade script would attempt to access a field in the CRD status that was not present (due to a reconciliation error, eg. because the content library was deleted later)
8.0 U2
Below is error related to unreachable content library inside the imageregistry-controller-manager
/var/log/pods/vmware-system-imageregistry_vmware-system-imageregistry-controller-manager-<suffix>/manager/0.log
YYYY-MM-DDTHH:MM:SS.067759754Z stderr F E0605 09:38:27.067636 1 content_library_provider.go:56] vsphere/contentlibrary "msg"="content library does not exist" "error"="GET https://<vc-fqdn>:443/rest/com/vmware/content/library/id:5f773a1c-5aa3-4268-871a-####: 404 Not Found" "libraryUUID"="5f773a1c-5aa3-4268-871a-####"
YYYY-MM-DDTHH:MM:SS.067864156Z stderr F E0605 09:38:27.067783 1 contentlibrary_controller.go:172] controllers/ContentLibrary "msg"="Failed to update ContentLibrary status" "error"="The underlying content library with ID 5f773a1c-5aa3-4268-871a-#### does not exist in vSphere" "
clName"="cl-############"
YYYY-MM-DDTHH:MM:SS.06788166Z stderr F E0605 09:38:27.067821 1 contentlibrary_controller.go:154] controllers/ContentLibrary "msg"="Failed to reconcile ContentLibrary" "error"="The underlying content library with ID 5f773a1c-5aa3-4268-871a-#### does not exist in vSphere" "clNam
e"="cl-############"
YYYY-MM-DDTHH:MM:SS.076775801Z stderr F E0605 09:38:27.076666 1 controller.go:329] "msg"="Reconciler error" "error"="The underlying content library with ID 5f773a1c-5aa3-4268-871a-#### does not exist in vSphere" "ContentLibrary"={"name":"cl-############","namespace":"<namespace>"} "controller"="contentlibrary" "controllerGroup"="imageregistry.vmware.com" "controllerKind"="ContentLibrary" "name"="cl-############" "namespace"="<namespace>" "reconcileID"="49c67465-cd87-4dd2-8ef3-9f7b27611f18"
- ImageRegistryUpgrade failed
/var/log/vmware/upgrade-ctl-compupgrade.log
YYYY-MM-DDTHH:MM:SS.580Z ERROR compupgrade: {"error": "TypeError", "message": "argument of type 'NoneType' is not iterable", "backtrace": [" File \"/usr/lib/vmware-wcp/upgrade/compupgrade.py\", line 362, in do_upgrade_with_out_resume_failed_support\comp.doUpgrade(upCtx)\n", " File \"/usr/lib/vmware-wcp/objects/image-registry-operator/imageregistry_upgrade.py\", line 323, in doUpgrade\n self.updateV1alpha1ImageRegistryResources()\n", " File \"/usr/lib/vmware-wcp/objects/image-registry-operator/imageregistry_upgrade.py\", line 171, in updateV1alpha1ImageRegistryResources\n self.updateV1alpha1Resource('contentlibraries', True)\n", " File \"/usr/lib/vmware-wcp/objects/image-registry-operator/imageregistry_upgrade.py\", line 193, in updateV1alpha1Resource\n patch_list = self.getResourcePatchBody(status, resource_kind)\n", " File \"/usr/lib/vmware-wcp/objects/image-registry-operator/imageregistry_upgrade.py\", line 218, in getResourcePatchBody\n if 'UTC' in creation_time:\n"]}
CSI controller shows failed to get availability Zone error.
/var/log/pods/vmware-system-csi_vsphere-csi-controller-<controller suffix>/vsphere-csi-controller/0.log
YYYY-MM-DDTHH:MM:SS.323136695Z stderr F {"level":"info","time":"YYYY-MM-DDTHH:MM:SS.323087834Z","caller":"wcp/controller.go:102","msg":"Initializing WCP CSI controller","TraceId":"9a1e4147-2bbc-43a9-8521-0660ba500bad"}
YYYY-MM-DDTHH:MM:SS.332390722Z stderr F {"level":"error","time":"YYYY-MM-DDTHH:MM:SS.33225253Z","caller":"wcp/controller.go:112","msg":"failed to get clusterComputeResourceMoIds. err: could not find any AvailabilityZone","TraceId":"9a1e4147-2bbc-43a9-8521-0660ba500bad","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).Init\n\t/build/mts/release/bora-22899879/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/wcp/controller.go:112\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/mts/release/bora-22899879/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/mts/release/bora-22899879/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/driver.go:202\nmain.main\n\t/build/mts/release/bora-22899879/cayman_vsphere_csi_driver/vsphere_csi_driver/src/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/build/mts/release/bora-22899879/compcache/cayman_go/ob-22775654/linux64/src/runtime/proc.go:250"}Issue can be resolved by removing the orphaned content library 5f773a1c-5aa3-4268-871a-#### from the namespaces, this can be done via vAPI / DCLI, for example, if this namespace namespace-test only has that library, run this to remove associated libraries:
dcli> namespaces instances update --namespace namespace-test --content-libraries '[]'
This allows for the upgrade to pass the image registry component.