ESXi host is unable to enter maintenance mode with vCLS vm not powering off
search cancel

ESXi host is unable to enter maintenance mode with vCLS vm not powering off

book

Article ID: 417364

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vCenter Server 8.0

Issue/Introduction

  • ESXi the system is stuck entering maintenance mode. The system shows waiting for all VMS to be migrated off.

  • A vCLS-## VM is seen powered on the host and not powering off.

  • You see entries in vCenter server vpxd.log similar to: 

    /var/log/vmware/vpxd/vpxd.log

    YYYY-MM-DDTHH:MM:SS.###+##:## info vpxd[#####] [Originator@#### sub=MoCluster opID=PodCrxMgr-domain-c####-######] Dumping vCLS Pod Crx host infos; domain-c####, [{[vim.HostSystem:host-####,<host_name>], st: #, vm: [vim.VirtualMachine:vm-####,vCLS-########-####-#a##-####-#########c#b], t: (null), ghost vm: (null), aa: false, f: 0, }, {[vim.HostSystem:host-####,<host_name>], st: #, vm: [vim.VirtualMachine:vm-2589,vCLS-########-####-#a##-####-#########c##], t: (null), ghost vm: (null), aa: false,
     f: 0, }, {[vim.HostSystem:host-####,<host_name>], st: 0, vm: (null), t: (null), ghost vm: (null), aa: false, f: 0, }]

  • The log events in ESXI host infravisor.log shows error message "Failed to determine if pod was stopped" and remains as stale vCLS virtual machine reference due to the synch failure. 

    /var/log/vmware/infravisor.log 

    YYYY-MM-DDTHH:MM:SS<time_zone> No(5) infravisor[2099185]: time="YYYY-MM-DDTHH:MM:SS<time_zone>" level=info msg="Pod deleted" namespace=vcls pod=vcls-######-####-#a##-####-########## uid=####################
    YYYY-MM-DDTHH:MM:SS<time_zone>No(5) infravisor[2099185]: time="YYYY-MM-DDTHH:MM:SS<time_zone>" level=error msg="Failed to determine if pod was stopped" error="ServerFaultCode: The object 'vim.VirtualMachine:###' has already been deleted or has not been completely created" namespace=vcls pod=vcls-########-####-#a##-####-#########c## uid=##################

Environment

  • VMware vCenter Server 8.x
  • vSphere ESXi 8.x

Cause

vCLS VMs are supposed to be turned off and removed when a host is put into maintenance mode.  The Infravisor service on the host is triggered to perform this action from vCenter (vpxd) to remove the VM.  The vpxd service is not issuing the command due to the service being out of sync with the vCLS infravisor service.

Resolution

  • To resolve this issue, need to restart the vCenter service appliance. 
  • This will refresh the data used for vCLS VMs and issue the command to the host to remove the vCLS VMs on the host.

After the vpxd service is restarted, view the infravisor.log to see the task of destroying the vCLS VM.

infravisor.log

No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Marked pod vcls/vcls-######-####-####-####-#### for deletion" namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Stopping pod: vcls/vcls-######-####-####-####-####'
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Performing VM Operation" VM-OP=PowerOff namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="stopped watching configstore ID esx/infravisorpods/vcls"
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Received notification that configstore ID esx/infravisor pods/vcls changed"
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Exiting pod-updater routine for configstore IDesx/infravisor pods/vcls"
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="VM Operation succeeded" VM-OP=PowerOff namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Performing VM Operation" VM-OP=Destroy namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="VM Operation succeeded" VM-OP=Destroy namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Destroy PodCRX /var/run/crx/infra/vCLS-######-####-####-####-####/vCLS-######-####-####-####-####.vmx err :< nil>"
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="Deleted pod from provider" namespace=vcls pod=vcls-######-####-####-####-#### uid=####
No(5) infravisor [7978142]: time="YYYY-MM-DDTHH:MM:SS.SZ" level=info msg="DeletionGracePeriodSeconds set to 0" namespace=vcls pod=vcls-######-####-####-####-#### uid=####