Proactive Upgrade Case for VKS/Workload Cluster

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Note: The Proactive Upgrade process outlined in this KB is available only to Advanced Support customers with a Support Account Manager (SAM) and/or Dedicated Technical Support Engineer (DTSE).

Prerequisites to be followed before opening Proactive VKS/Workload Cluster Upgrade Cases:

Cases must be opened 10 business days in advance of the upgrade window
Cases should be opened as Low-P4

VMware by Broadcom Support will:

Validate the target version and advise if a newer one is available
Review provided logs for known issues that may impact the upgrade, provide Workload/Supervisor Management Support Bundle and Workload Cluster Support Bundle as per Gathering Logs for vSphere with Tanzu
If a proactive Case is not raised with sufficient notice of 10 days, it will not be possible to offer log review and pre-check resolution within the planned time-frame

Please complete the following template when opening the Case:

Upgrade Date & Time?
Current Supervisor Cluster version?
Current Workload Cluster VKR?
Current Workload Cluster OS and OS Version?
Current Workload Cluster ClusterClass version?
Current VKS (VMware vSphere Kubernetes Service) Supervisor Service version?
Target Workload Cluster VKR version?
Target Workload Cluster OS and OS Version?
Target Workload Cluster ClusterClass version?
Target VKS Supervisor Service version?
Product Interoperability checked?
ClusterClass Compatibility Matrices checked?
Steps to Perform Before Upgrading to ClusterClass 3.30 or KR 1.32.X to 1.33.X checked?
Have the VKR release notes and VKS release notes been reviewed for the target version?
Number of Workload Clusters being upgraded (if applicable)?
Is the backup taken for the Workload Cluster via Velero?
Is the Workload Cluster managed by TMC (Tanzu Mission Control)?
Deployment Type (NSX, VDS + AVI, VDS + HAProxy, VDS + FLB)?

Additional data to be collected using below commands:

VDT tool output

From the Supervisor Context, check the Supervisor Cluster and Workload Cluster Object Health:

Command	Purpose
`kubectl get tkc,cluster -A`	List all workload clusters across all namespaces* *TKC is only applicable if you're still using the deprecated TKC object
`kubectl get cluster -n <workload cluster namespace> <workload cluster name> -o yaml \| less`	Check Conditions for the target workload cluster to be upgraded to validate health
`kubectl get pkgi -n <workload cluster namespace>`	List all installed Kapp-controlled package services
`kubectl get kcp,md,machine -A`	Check workload cluster machine health and VM objects
`kubectl get kcp,md,machine -n <cluster namespace>`	View workload cluster control plane and worker machine details for a specific cluster namespace
`kubectl get pods -A \| grep -v Running`	List Supervisor cluster system pods that are not in Running state across namespaces
`kubectl get kr`	Verify vSphere Kubernetes Release (VKR) versions compatibility* *TKR is used in VKS 3.1 and lower
`kubectl get pkgi -n vmware-system-supervisor-services`	Verify the status and versions of supervisor services

From the Workload Cluster Context, check the Workload Cluster Health:

Command	Purpose
`kubectl get nodes -A`	List all nodes in the workload cluster
`kubectl get pods -A \| grep -v Running`	List workload cluster system pods that are not in Running state across namespaces
ku`bectl get pkgi -A`	List all system installed packages and Kapp-controlled installed standard packages
`kubectl get pdb -A`	List all Pod Disruption Budgets (PDBs)
kubectl get validatingwebhookconfiguration,mutatingwebhookconfiguration -A	List all webhooks, including third party webhooks

Validate Certificate details:

Use "certmgr tkc certificates list -n <namespace> <workload cluster name>" command in the Supervisor cluster context from this reference article.

Verify SSO Login:

Validate login to the affected workload cluster's context is successful and kubectl commands can be run according to assigned SSO permissions.
- In vSphere 8.0, this is done through kubectl vsphere login commands.
- In vSphere 9.0, this is done through the vcf CLI.

PVC Health Check:

To validate the PVC health check, create a test Persistent Volume Claim (PVC) of 1 GB size and then delete it. This helps confirm that both PVC creation and deletion operations are functioning correctly.

Environment

vSphere with Tanzu 8.x/9.x - Supervisor Cluster with Workload Cluster(s)

Resolution

Open a Low - P4, Upgrade Awareness Case via Broadcom Support portal.
The purpose of a proactive upgrade case is to prepare in advance for the VKS or Workload Cluster upgrade.
This is not an exhaustive health check of the environment. If this is required, please engage VMware by Broadcom Professional Services.
If an issue is experienced during upgrade, Advanced Support customers should uplift their maintenance case.

Additional Information

When upgrading workload clusters, ensure that you are logged in through as a SSO user to ensure proper system checks are performed.

In vSphere 8.0, this is done through kubectl vsphere login commands.

In vSphere 9.0, this is done through the vcf CLI.

See the following documentation for details:

Configuring Identity and Access for VKS Clusters

Grant Developer Access to VKS Clusters