vCLS virtual machine deployment fails in the vCenter due to orphaned vCLS virtual machines on disconnected ESXi hosts
search cancel

vCLS virtual machine deployment fails in the vCenter due to orphaned vCLS virtual machines on disconnected ESXi hosts

book

Article ID: 419061

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • vCLS virtual machines are missing or unhealthy in a vSphere Cluster.

  • Enabling and disabling "Retreat Mode" fails to deploy new vCLS virtual machines.

  • Tasks related to vCLS virtual machine deletion or cleanup may fail with errors indicating the host is unreachable or the connection is lost.

  • One or more ESXi hosts that previously resided in the cluster are now in a disconnected or not responding state.

  • The following log snippets are observed in the /var/log/vmware/eam/eam.log file of the vCenter server:
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | ClusterAgentIssueHandler.java | 163 | Resolving ClusterAgent(ID: 'Agent:########-####-####-####-############:null') issues: issues=[I@########, unknown=null
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | OpId.java | 37 | [vm-1027->VM:Runtime:################] created from [ISSUE_CHECK:################]
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vim-async-1 | OpIdLogger.java | 35 | [vm-#->VM:Runtime:################] Completed.
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | ClusterAgentIssueHandler.java | 179 | Not resolvable issues:[#]
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | ClusterAgent.java | 965 | [checkGoal:ClusterAgent(ID: 'Agent:########-####-####-####-############:null')] has issues.
    YYYY-MM-DDTHH:MM:SS.Z |  WARN | vlsi | AgentBase.java | 1118 | ClusterAgent(ID: 'Agent:########-####-####-####-############:null') status is not consistent with the remaining operations.
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | AgencyIssueHandler.java | 117 | Resolving ClusterVMAgency(ID:'Agency:########-####-####-####-############:null'): issues=int[] [
       #
    ], unknown=null
    .
    .
    
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vim-async-0 | Workflow.java | 121 | [VirtualMachine:vm-#->VM:Delete:################] FAILED
    com.vmware.eam.EamRemoteSystemException: Unexpected error powering off ########-####-####-####-############::VirtualMachine:vm-#
            at com.vmware.eam.vim.vm.impl.VirtualMachine.powerOffExcTransform(VirtualMachine.java:372) ~[eam-server.jar:?]
            at com.vmware.eam.vim.task.impl.VimTask.processCompleted(VimTask.java:99) ~[eam-server.jar:?]
            at com.vmware.eam.vim.task.impl.VimTask.lambda$triggerPullResult$1(VimTask.java:75) ~[eam-server.jar:?]
            .
            .
    Caused by: com.vmware.vim.binding.vmodl.fault.HostNotConnected: Unable to communicate with the remote host, since it is disconnected.
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_422]
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_422]
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_422]
            .
            .
  •  The /var/log/vmware/eam/eam_api.log file of the vCenter server shows the following entries:
    YYYY-MM-DDTHH:MM:SS.Z |  INFO | vlsi | LocalizationFilter.java | 108 | API COMPLETE: ClusterVMAgency(ID:'Agency:########-####-####-####-############:null').queryRuntime[opId=268271747, sessionId=73A7F452]. Result:
    eam.EamObject.RuntimeInfo {
       issue = (eam.issue.Issue) [
          (eam.issue.cluster.agent.VmNotRemoved) {
             description = <unset>,
             key = #,
             time = YYYY-MM-DD HH:MM:SS,
             agency = 'Agency:########-####-####-####-############:null',
             solutionId = 'VSPHERE.LOCAL\vpxd-extension-########-####-####-####-############',
             agencyName = 'vCLS',
             solutionName = ' ',
             agent = 'Agent:########-####-####-####-############:null',
             cluster = 'ClusterComputeResource:domain-c#:########-####-####-####-############',
             vm = 'VirtualMachine:vm-#:########-####-####-####-############',
          },
       ],
       goalState = 'uninstalled',
       entity = 'Agency:########-####-####-####-############:null',
       status = 'red',
    }

 

Environment

  • vCenter Server Appliance 7.x
  • vCenter Server Appliance 8.x

Cause

This issue occurs because the vPostgres database of the vCenter (VCDB) retains records of the vCLS virtual machines being registered to the disconnected ESXi hosts.

When vCenter attempts to "clean up" the environment (triggered by Retreat Mode or general health checks), it attempts to contact the registered host to issue a generic destroy/delete command for the virtual machine. Because the host is disconnected or powered down, vCenter cannot communicate with the host agent to confirm the deletion. Consequently, the database entry remains "stale," and the new vCLS VMs cannot be deployed until the old state is reconciled.

Resolution

  • Identify the cluster having the issue and note the host(s) previously holding the vCLS virtual machines.

  • Bring the host(s) back online or place them in a state reachable by vCenter so cleanup of existing vCLS virtual machines can complete.

  • If the host(s) cannot be brought online, remove or unregister the stale vCLS virtual machine records from the VCDB:
    Unable to deploy vCLS VMs by switching to Retreat mode

Note: Take a snapshot/backup of the vCenter Server Appliance (VCSA) before making changes.