Symptoms:
- When upgrading WCP Supervisor Cluster from vCenter, the upgrade is stuck at 50% with GUI messaging in vCenter showing: "Namespaces cluster upgrade is in upgrade component steps"
- When reviewing the cluster status from vCenter SSH using DCLI, you see messaging similar to:
dcli> namespacemanagement software clusters list
|------------|------------|-----------------------------------|-----------------------------------|-----------------------------------|-------|--------
|cluster |cluster_name|desired_version |available_versions |current_version |state |last_upgraded_date |
|------------|------------|-----------------------------------|-----------------------------------|-----------------------------------|-------|--------
|domain-c8| |v1.19.1+vmware.2-vsc0.0.10-18245956|v1.19.1+vmware.2-vsc0.0.10-18245956|v1.18.2-vsc0.0.7-17449972 |PENDING|2022-06-06T06:13:51.000Z|
|------------|------------|-----------------------------------|-----------------------------------|-----------------------------------|-------|--------
dcli> namespacemanagement software clusters get --cluster domain-c8
upgrade_status:
desired_version: v1.19.1+vmware.2-vsc0.0.10-18245956
messages:
- severity: ERROR
details: A general system error occurred.
progress:
total: 100
completed: 50
message: Namespaces cluster upgrade is in upgrade components step.
available_versions:
- v1.19.1+vmware.2-vsc0.0.10-18245956
current_version: v1.18.2-vsc0.0.7-17449972
messages:
- severity: WARNING
details: Cluster domain-c8 is on the oldest version.
state: PENDING
last_upgraded_date: 2022-06-06T06:13:51.000Z
- The /var/log/vmware/wcp/wcpsvc.log on vCenter shows messaging similar to:
2022-06-06T06:13:51.213Z error wcp [kubelifecycle/upgrade_controller.go:312] [opID=upgrade-domain-c8] Failed to upgrade cluster domain-c8, upgrade state: ERROR
........
2022-06-06T08:02:04.795Z error wcp [kubelifecycle/compupgrade.go:235] [opID=upgrade-domain-c8] Component upgrade failed: Component upgrade failed.
- Messaging in the wcpsvc.log on vCenter is followed by errors in the TkgUpgrade component:
2022-06-06T08:05:21.480Z DEBUG comphelper: ret=0 out={"controller_id": "420c1019beddcb6b04247eb123c5689", "state": "error", "messages": [{"level": "error", "message": "Component TkgUpgrade failed: Failed to run command:
- Within the above message stack, you see the specific component failure is:
The validatingwebhookconfigurations \"vmware-system-tkg-validating-webhook-configuration\" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update\n"}]}
- This failure may also present with the CapwUpgrade component, specifically the capi-kubeadm-control-plane-validating-webhook-configuration webhook.
- Further logging can be found on the Supervisor node SSH in the /var/log/vmware/upgrade-ctl-compupgrade.log file