vpxd crashes frequently as DRS could not place vCLS VMs on hosts
search cancel

vpxd crashes frequently as DRS could not place vCLS VMs on hosts

book

Article ID: 427480

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • vmware-vpxd service crashes frequently with vpxd core dumps generated.
  • All clusters experiencs vSphere HA failure, with the state currently stuck in the election process.
  • vpxd.log (log path: /var/log/vmware/vpxd) contains the following log snippets: DRS tried to place the vCLS VM on hosts to power-on multiple times but DRS could not place the VM on the host while the vSphere HA (cluster hosts stuck in election process) was failing to configure.

YYYY-MM-DDTHH:MM info vpxd[06787] [Originator@6876 sub=MoCluster opID=WorkQueue-64370a3b] Received VM state change for HDCS VM; <domain_moid>, [vim.VirtualMachine:<VM_moid>,vCLS-########-ed07-4434-8da9-############] oldConnState: 1, newConnState: 3, oldPowerState: 0, newPowerState: 0
YYYY-MM-DDTHH:MM info vpxd[06787] [Originator@6876 sub=MoCluster opID=WorkQueue-64370a3b] Completed request from LRO request queue; {VclsPodCrxVmStateChanged([vim.HostSystem:<Host_moid>,<Host_fqdn>], [vim.VirtualMachine:<VM_moid>,vCLS-########-ed07-4434-8da9-############], power: 0->0, connect: 1->3), p: 00007fc188043990, attempt: 0}, e: N17RequestQueueInLro24QueueNotRunningExceptionE(LRO request queue is not running)
YYYY-MM-DDTHH:MM info vpxd[06787] [Originator@6876 sub=HostCnx opID=WorkQueue-64370a3b] [VpxdHostCnx::AddConnection] cnx: ########-0650-a8d5-c2ba-############, h: <Host_moid>

  • ESXi hosts are running on 8.0 U2 and below or ESXi 7.x.

Environment

  • vCenter 8.x
  • ESXi 7.x
  • ESXi 8.0.2

Cause

The vCenter service (vmware-vpxd) crashed due to a fatal DRS exception triggered by vCLS VM placement as the VM could not be placed due to unhealthy cluster.

Resolution

  1. Disable and enable vSphere HA on all the clusters currently stuck in the election phase. Refer: Disabling and enabling VMware vSphere High Availability
  2. Enable retreat mode on the affected clusters by changing the EDIT VCLS MODE to Retreat mode, refer: Disable vCLS on a Cluster via Retreat Mode. Wait until the vCLS VMs are deleted and proceed to disable retreat mode by changing the EDIT VCLS MODE to System Managed.

Additional Information

  • Embedded vCLS (vSphere Cluster Services) was introduced in vSphere 8.0 Update 3. This updated architecture utilizes vSphere Pod technology to manage cluster services directly via the ESXi host, and these embedded vCLS do not use datastore and runs on host memory avoiding the need for datastore for the VM placement.
  • If ESXi hosts are running on 8.0 U2 or below while vCenter is in 8.0 U3, the vCenter will deploy external vCLS VMs on the hosts as backward compatibility and these external vCLS VMs are deployed on datastores.
  • Upgrade all the ESXi hosts to 8.0 U3 to start using embedded vCLS.