VKS Supervisor stuck in "Removing" state
search cancel

VKS Supervisor stuck in "Removing" state

book

Article ID: 424566

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • The VKS supervisor cluster is manually deactivated. However, it gets stuck in "Removing" state.

  • There is a recurring "Remove Solution" task on the cluster initiated by the solution user "vpxd-extension" as a part of the removal process. The same fails every time with the error below.

    A general system error occurred: Cannot download VIB 'https://hostupdate.vmware.com/software/VUM/PRODUCTION/addon-main/addon/<vib-specification>'. This might be because of network issues or the specified VIB does NOT exist or does NOT have a proper 'read' privilege set.
    Make sure the specified VIB exists and is accessible from vCenter Server.

  • The Supervisor Control Plane Virtual Machines are no longer present in the inventory as well as under the vSphere ESX Agent Manager. 

  • In /var/log/vmware/wcp/wcpsvc.log of the vCenter Server Appliance, you see the relevant log snippets below . 

    info wcp [kubelifecycle/spherelet_lifecycle.go:169] [opID=<ID> Unconfigure spherelet from all hosts in cluster domain-c7
    debug wcp [kubelifecycle/controller.go:2331] [opID=<ID>] Wait for hosts to be cleanup from the cluster.
    warning wcp [kubelifecycle/controller.go:442] [opID=<ID>] Unable to disable cluster domain-c7. Err <nil>
    info wcp [pman/client.go:597] [opID=vLCM:Disable:domain-c<ID>] PMan API: Delete Task: URL: /api/esx/settings/clusters/domain-c7/software/solutions/com.vmware.vsphere-wcp?vmw-task=true
    info wcp [pman/client.go:603] [opID=vLCM:Disable:domain-c<ID>] PMan API: Delete Task: Attempt#[1 of 1]: Starting ...
    debug wcp [ssolib/sts.go:100] [opID=vLCM:Disable:domain-c<ID>] Getting HOK signer; store: vpxd-extension, alias: vpxd-extension
    debug wcp [logger/trace.go:92] [opID=<ID>] [ END ] [kubelifecycle.(*Controller).syncKubeInstanceState:400] [181.114µs] supervisor=<ID>
    debug wcp [pman/client.go:728] [opID=vLCM:Disable:domain-c<ID>] PMan API: vAPI session deletion is successful
    debug wcp [vcrestlib/client.go:177] [opID=vLCM:Disable:domain-c<ID>] vcrestlib: requesting new session
    info wcp [pman/client.go:612] [opID=vLCM:Disable:domain-c<ID>] PMan API: Delete Task: Attempt#[1 of 1]: Received Task ID: <ID>:com.vmware.esx.settings.clusters.software.solutions
    info wcp [pman/client.go:641] [opID=vLCM:Disable:domain-c<ID>] PMan API: Task Get for Delete Task API: Attempt#[1]:Failed Attempts:[0] of MaxFailedAttemps[1]Starting ...
    info wcp [pman/client.go:652] [opID=vLCM:Disable:domain-c<ID>] PMan API: Task Get for Delete Task API: Attempt#[1]:Failed Attempts:[0] of MaxFailedAttemps[1]Status = RUNNING
    info wcp [pman/client.go:672] [opID=vLCM:Disable:domain-c<ID>] PMan API: Task Get for Delete Task API: In Progress ...
    debug wcp [kubelifecycle/pman_client.go:63] [opID=vLCM:Disable:domain-c<ID>] PMan API: Delete Task RUNNING Output: {
            "value": {
                    "progress": {
                            "message": {
                                    "default_message": "Current progress for task created by VMware vSphere Lifecycle Manager",
                                    "id": "com.vmware.vcIntegrity.lifecycle.Task.Progress",
                                    "args": [],
                                    "params": null,
                                    "localized": "Current progress for task created by VMware vSphere Lifecycle Manager"
                            },
                            "total": 100,
                            "completed": 0
                    },
                    "result": null,
                    "error": {
                            "@class": "",
                            "error_type": "",
                            "messages": null
                    },
                    "status": "RUNNING",
                    "parent": "",
    error wcp [pman/client.go:659] [opID=vLCM:Disable:domain-c<ID>] PMan API: Task Get for Remove Task API: Attempt#[3]:Failed Attempts:[0] of MaxFailedAttemps[1]Error Message: Cannot download VIB 'https://hostupdate.vmware.com/software/VUM/PRODUCTION/addon-main/addon/<vib-specification>'. This might be because of network issues or the specified VIB does NOT exist or does NOT have a proper 'read' privilege set. Make sure the specified VIB exists and is accessible from vCenter Server.

Environment

vSphere Kubernetes Service

Cause

Removing spherelet from the ESXi Hosts as a part of the Supervisor cluster removal/deactivation is reliant on the Lifecycle Manager Synchronization. The LCM Database still referencing the old set of URLs' which are no longer valid causes the Supervisor deactivation to fail. 

Resolution

Update the Lifecycle Manager Database with the correct set of new URLs. The detailed instructions can be found here- Error: A general system error occurred: Failed to download VIB(s): Error: HTTP Error Code: 403, vLCM fails to download the ESXi patches and images from online repositories

Once the LCM Database is correctly updated, the recurring "Remove Solution" task completes successfully allowing the Supervisor Cluster de-activation to smoothly process through.