Unable to start a VKS cluster upgrade to a higher VKR version because of an error similar to the following:
update cannot be initiated as <affected VKS cluster>'s SystemChecksSucceeded condition is not True.
The error contains a Message with more details on the specific component blocking the VKS cluster upgrade.
vSphere Supervisor
VKS Cluster
VKS 3.5 and higher
VKS 3.5 and 3.6 introduce system pre-checks to detect misconfigured Kubernetes components that are known to cause cluster upgrades to become stuck.
Previously, these misconfigurations could cause VKS cluster upgrades to stall or fail without a clear indication of the root cause.
When the system pre-checks detect one of these issues, it will flag a "not True" failure for the SystemChecksSucceeded condition and include a Message with further details.
These system pre-checks include the following known issues:
The corresponding steps related to the detailed error message should be followed.
If there are any concerns regarding these steps, reach out to VMware by Broadcom Technical Support.
Message: PodDisruptionBudgets blocking rolloutsThere are one or more PodDisruptionBudgets (PDBs) in the VKS cluster with an Allowed Disruption value of 0.
These objects monitor the count of pods for an application and can be configured to ensure a specific number of pods are Running at all times. However, this can cause VKS cluster upgrades and rolling redeployments to become stuck Deleting in Ready,ScheduledDisabled state because the PDB is preventing the pod on that stuck node from draining and terminating.
kubectl get pdb -A
Message: MisconfiguredSoftwareChecks failed: [<third party webhook>]Where the value in brackets is one of the following third party webhooks:
NOTE: VMware by Broadcom is not responsible for and does not provide support for third party applications.
Any issues with webhooks installed by a third party application should be escalated to the third party application owner.
The following steps detailed how to temporarily take a backup of and temporarily delete the third party webhooks in the affected VKS cluster.
kubectl get validatingwebhookconfiguration,mutatingwebhookconfiguration -A
kubectl get validatingwebhookconfiguration <third party validating webhook configuration> -o yaml > <vwc-backup>.yaml
kubectl get mutatingwebhookconfiguration <third party mutating webhook configuration> -o yaml > <mwc-backup>.yaml
kubectl delete validatingwebhookconfiguration <third party validating webhook configuration>
kubectl delete mutatingwebhookconfiguration <third party mutating webhook configuration>
kubectl apply -f <vwc-backup>.yaml
kubectl apply -f <mwc-backup>.yaml
Expected system webhooks in the environment would be related to the CNI or any installed packages (PKGI) in the workload cluster.
For example, the expected system antrea webhooks are:
----
Release Notes: vSphere Kubernetes Service 3.6.0+v1.35