Supervisor update remains in 'Pending' state after successful control plane node upgrade

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The Supervisor Control Planes have been upgraded.
All the system pods are in a Running state.
The spherelet version has not been upgraded, command to view the spherelet version from the supervisor namespace: kubectl get nodes
Query the status of the supervisor using the command:

dcli com vmware vcenter namespacemanagement software clusters list

Provide the local administrator credentials such as [email protected] for authentication

Example output:

|----------|------------|-----------------------------------------|-----------------------------------------|------------------------------------------|-------|------------------|

|cluster |cluster_name|desired_version |available_versions |current_version |state |last_upgraded_date|

|----------|------------|-----------------------------------------|-----------------------------------------|------------------------------------------|-------|------------------|

|domain-c##| |v1.##.6+vmware.3-fips-vsc9.0.0.0-248##022|v1.##.6+vmware.3-fips-vsc9.0.0.0-248##022|v1.##.10+vmware.1-fips-vsc9.0.0.0-248##022|PENDING| |

|----------|------------|-----------------------------------------|-----------------------------------------|------------------------------------------|-------|------------------|
On the vCenter sever review the WCP logs at /var/log/vmware/wcp/wcpsvc.log:

[timestamp] error wcp [pman/client.go:657] [opID=vLCM:Upgrade:domain-c8] PMan API: Task Get for Apply Task API: Attempt#[15]:Failed Attempts:[0] of MaxFailedAttemps[1]Error Type: ERROR
[timestamp] error wcp [pman/client.go:659] [opID=vLCM:Upgrade:domain-c8] PMan API: Task Get for Apply Task API: Attempt#[15]:Failed Attempts:[0] of MaxFailedAttemps[1]Error Message: Failed getting host recommendation from DRS to enter maintenance mode for cluster 'example_hostname'. Reason: 'Currently connected device 'CD/DVD drive 1' uses backing '[example_datastore] example_image.iso', which is not accessible.'.
[timestamp] error wcp [kubelifecycle/pman_client.go:485] [opID=vLCM:Upgrade:domain-c8] PMan API: ApplyUpgradeTask INTERIM FAILURE Error: PMan API: Task Get for Apply Task API: Attempt#[15]:Failed Attempts:[0] of MaxFailedAttemps[1]Error Type: ERROR
[timestamp] error wcp [pman/client.go:615] [opID=vLCM:Upgrade:domain-c8] PMan API: Apply Task: Attempt#[1 of 1]: Has Failed - Task Get Polling failed!

On the vCenter sever review the Update Manager logs at /var/log/vmware/vmware-updatemgr/vmware_vum_server.log

[timestamp] info vmware-vum-server[39##95] [Originator@6876 sub=PM.AsyncTask.ClusterApplySolutionTask{3218}] [vciTaskBase 1496] SerializeToVimFault fault:
--> (vmodl.fault.SystemError) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = (vmodl.LocalizableMessage) [
--> (vmodl.LocalizableMessage) {
--> key = "com.vmware.vcIntegrity.lifecycle.TaskError.GetNextHostToRemediateFailed",
--> arg = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = "example_hostname"
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "Currently connected device 'CD/DVD drive 1' uses backing '[example_datastore] example_image.iso', which is not accessible."
--> }
--> ],
--> message = <unset>
--> }
--> ],
--> reason = "vLCM Task failed, see Error Stack for details."
--> msg = "{
--> "data": null,
--> "error_type": "ERROR",
--> "messages": [
--> {
--> "args": [
--> "example_hostname",
--> "Currently connected device 'CD/DVD drive 1' uses backing '[example_datastore] example_image.iso', which is not accessible."
--> ],
--> "default_message": "Failed getting host recommendation from DRS to enter maintenance mode for cluster 'example_hostname'. Reason: 'Currently connected device 'CD/DVD drive 1' uses backing '[example_datastore] example_image.iso', which is not accessible.'.",
--> "id": "com.vmware.vcIntegrity.lifecycle.TaskError.GetNextHostToRemediateFailed"
--> }
--> ]
--> }

Cause

The spherelet upgrade failed because DRS could not place the ESXi host into Maintenance Mode. This occurred because a VM on the host has a CD/DVD drive connected to an ISO on an inaccessible or local datastore, blocking the vMotion required by vLCM remediation task.

Resolution

Disconnect the inaccessible media to allow DRS to migrate the VMs and remediate the hosts:

Identify the VM: Check logs for the host and ISO name mentioned in the error.
Edit VM Settings: Change the CD/DVD drive to "Client Device" and uncheck Connected and Connect at Power On.
Retry: Resume the upgrade once the VM is no longer pinned to the host.

For further details, refer: VMotion fails with the compatibility error.