The lifetime of Embedded vCLS VMs is generally managed automatically. However, there may be cases where these mechanisms fail and a VM may undesirably remain present on an ESX host. In such cases, an administrator may need to use the ESX shell to manually remove the unwanted VM. This article explains the process.
Please review the below use cases where this article is applicable:
vCenter Server 8.0 U3
ESXi 8.0 U3
In most cases, Embedded vCLS VMs are automatically destroyed in cases where they aren't desired. This includes putting the host into Maintenance Mode or Standby Mode, Disconnecting it, Removing it from the cluster or inventory, setting an Anti-Affinity rule for it (as long as one other host is available without such a rule), and enabling Retreat Mode on the cluster. They may not be cleaned up if a host is removed while it is Not Responding, or if a cluster is destroyed directly with Embedded vCLS VMs present. However, re-adding an affected host to a supported vCenter as Standalone should clean up the lingering VM.
Despite these safeguards, there may be cases where an Embedded vCLS VM fails to be destroyed. There may be a problem on the host-side causing the destroy operations to fail, or the host is being added to a vCenter that isn't vCLS-aware. In these cases, the VM may be stuck in place. If this occurs, an administrator needs to manually destroy the VM on the affected host.
Attempting to power-off an Embedded vCLS VM via inventory operations will usually destroy the current instance of the VM, which can clear up some transient issues. However, it does not affect the desired state of the VM being running, so the host will quickly deploy a new instance in its place.
The following steps can be used to manually destroy an Embedded vCLS VM on ESX 8.0 update 3. These actions are performed using the ESXi Shell, typically over SSH.
Warning: This guide does not apply to Embedded vCLS VMs that are running in an intended state. Intended state meaning on a host that is supported, connected, available, not entering Maintenance Mode, and connected to a support vCenter in a cluster where Retreat Mode and Anti-Affinity aren't excluding the host from running the VM. In such cases, attempting to destroy it using this method will cause it to be re-deployed shortly or cause vCenter and ESXi to get out of sync regarding the VM's state.
These steps are safe to perform if the host is stuck entering Maintenance Mode according to vCenter. However, it is not recommended to perform these steps while the host is already in Maintenance Mode. A state with Embedded vCLS running while in Maintenance Mode is difficult to achieve but possible. If it is the case, take the host out of Maintenance Mode.
Verify that the infravisor service is running.
Check the status.
[root@localhost:~] /etc/init.d/infravisor status infravisor is running # If the output was "infravisor is not running", start the service [root@localhost:~] /etc/init.d/infravisor start |
Read the configuration.
[root@localhost:~] configstorecli config current get -c esx -g infravisor_pods -k vcls { "pod_settings": { "enabled": true, ... } } |
If the value of pod_settings.enabled is set to true, update it to false.
[root@localhost:~] configstorecli config current set -c esx -g infravisor_pods -k vcls -p /pod_settings/enabled -v false
Set: completed successfully
|
Confirm that the pod is not running anymore.
# Success [root@localhost:~] inf-cli get pods -n vcls No Pods found in namespace: vcls # Failure [root@localhost:~] inf-cli get pods -n vcls NAMESPACE NAME STATUS REASON IP_ADDRESS vcls vcls-420fb029-6faa-4319-14d4-58b5a10954c2 RUNNING N/A N/A |
If a pod was found, kill it.
[root@localhost:~] inf-cli kill -p /etc/vmware/infravisor/manifests/vcls.yaml Killed podVM for pod vcls/vcls-420fb029-6faa-4319-14d4-58b5a10954c2 |