DRS processes fail to Run: "vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS"
search cancel

DRS processes fail to Run: "vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS"

book

Article ID: 395205

calendar_today

Updated On:

Products

VMware vCenter Server VMware vCenter Server 8.0

Issue/Introduction

  • Cluster Service health shows "Degraded"

  • The vSphere Cluster Services (vCLS) virtual machines are in a powered-off state. When attempting to power on the affected vCLS VMs on the cluster nodes, the operation fails with the following explicit error:

    "Feature 'cpuid.mwait' was 0, but must be 1"

  • The virtual machine log (/var/run/crx/infra/<vCLS_VM_Name>/vmware.log )indicates that the FeatureCompatLate module fails to load because the required mwait instruction is not presented by the underlying host.

    YYYY-MM-DDTHH:MM:SS  In(05) vmx - FeatureCompat: Failed Requirements:
    YYYY-MM-DDTHH:MM:SS  In(05) vmx - VM Features Required: cpuid.mwait - Num:Match:1
    YYYY-MM-DDTHH:MM:SS  In(05) vmx - Module 'FeatureCompatLate' power on failed.
    YYYY-MM-DDTHH:MM:SS In(05)+ vmx - Power on failure messages: Feature 'cpuid.mwait' was 0, but must be 0x1.
    YYYY-MM-DDTHH:MM:SS In(05)+ vmx - Module 'FeatureCompatLate' power on failed.
    YYYY-MM-DDTHH:MM:SS In(05)+ vmx - Failed to start the virtual machine.
    YYYY-MM-DDTHH:MM:SS In(05)+ vmx -
    YYYY-MM-DDTHH:MM:SS In(05) vmx - Vix: [mainDispatch.c:4211]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
    YYYY-MM-DDTHH:MM:SS In(05) vmx - Transitioned vmx/execState/val to poweredOff
    YYYY-MM-DDTHH:MM:SS In(05) vmx - Vix: [mainDispatch.c:4211]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0
    YYYY-MM-DDTHH:MM:SS In(05) vmx - Vix: [mainDispatch.c:4251]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error
    YYYY-MM-DDTHH:MM:SS In(05) vmx - Vix: [mainDispatch.c:4211]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

  • In /var/run/log/infravisor.log, confirm that the power-on operation is actively rejected due to the CPU instruction mismatch.

    YYYY-MM-DDTHH:MM:SS No(5) infravisor[41306843]: time="YYYY-MM-DDTHH:MM:SS" level=error msg="unexpected fault: &{{{{<nil> [{{} msg.featurecompat.requirement.number.mismatch [{{} 1 cpuid.mwait} {{} 2 0} {{} 3 1}] Feature 'cpuid.mwait' was 0, but must be 1.} {{} msg.moduletable.powerOnFailed [{{} 1 FeatureCompatLate}] Module 'FeatureCompatLate' power on failed. } {{} msg.vmx.poweron.failed [] Failed to start the virtual machine.}]}}} Feature 'cpuid.mwait' was 0, but must be 1.} taskerror: Feature 'cpuid.mwait' was 0, but must be 1." VM-OP=PowerOn namespace=vcls pod=vcls-1####2c-c##1-1##d-9##a-06####b9c2b uid=b3#####819a124###198c###

Cause

  • This issue occurs when the MONITOR/MWAIT CPU instruction set is disabled in the physical BIOS/UEFI settings of the ESXi host hardware.

  • vCLS VMs rely on specific CPU features for efficient power state management and execution. If the ESXi host hardware suppresses the cpuid.mwait feature, the vCLS VM architecture will strictly prevent the virtual machine from booting to avoid CPU instruction faults, which subsequently causes DRS processes to fail.

Resolution

To resolve this issue, the required CPU feature must be enabled at the hardware level and then recreate the vCLS VMs to ensure they register the new CPU capabilities.

Step 1: Enable MWAIT in the Host BIOS

  1. Place the affected ESXi host(s) into Maintenance Mode.

  2. Reboot the host and access the physical hardware BIOS/UEFI configuration utility.

  3. Locate the CPU/Processor settings and explicitly enable the Monitor/MWAIT option. (Note: The exact naming and location of this setting varies by hardware vendor; consult your vendor documentation if necessary).

  4. Save the BIOS settings and boot the host back into ESXi.

  5. Exit Maintenance Mode.

Step 2: Cycle vCLS Retreat Mode

To force the cluster to deploy fresh vCLS VMs that recognize the newly enabled cpuid.mwait parameter in host 

  1. Put the affected cluster into Retreat Mode. This will safely tear down the existing, broken vCLS VMs.

  2. Remove the cluster from Retreat Mode.

  3. vCenter will automatically deploy new vCLS VMs, power them on, and restore DRS functionality.

For detailed instructions on executing this step, refer to the official documentation: Putting a Cluster in Retreat Mode.

Additional Information

  • In order to identify whether MWAIT/MONITOR bit is set in BIOS perform the following steps:
    • SSH to the ESXi host
    • Run esxcfg-info | less
    • Search for CPU and find the starting snippets for the CPU information.
    • Look for CPUID 1 and check the corresponding ECX register value.
    • Compare the last bit and check if it is hexadecimal f.
    • If it is not f, it means that the MWAIT/MONITOR bit is not set in BIOS


Following is an example snippet displaying CPU information for a host that does not have MWAIT/MONITOR enabled:
 
\==+CPU Info :
|----Num Cores.......................................8
\==+Cpu Cores :
\==+CpuImpl :
|----ID........................................0
|----Family....................................6
|----Model.....................................63
|----Type......................................0
|----Stepping..................................2
|----Name......................................GenuineIntel
|----CPU Speed.................................2599997669
|----Bus Speed.................................99999908
|----APIC ID...................................0x00000000
|----Node......................................0
\==+CPU ID id0 :
|----EAX....................................0x0000000f
|----EBX....................................0x756e6547
|----ECX....................................0x6c65746e
|----EDX....................................0x49656e69
\==+CPU ID id1 :
|----EAX....................................0x000306f2
|----EBX....................................0x00100800
|----ECX....................................0x77fefbf7
|----EDX....................................0xbfebfbff
 
 
 
 
Following is an example for the same host when MWAIT/MONITOR is enabled:

\==+CPU Info :
|----Num Cores.......................................8
 
 
    \==+Cpu Cores :
\==+CpuImpl :
|----ID........................................0
|----Family....................................6
|----Model.....................................63
|----Type......................................0
|----Stepping..................................2
|----Name......................................GenuineIntel
|----CPU Speed.................................2599997929
|----Bus Speed.................................99999901
|----APIC ID...................................0x00000000
|----Node......................................0
\==+CPU ID id0 :
|----EAX....................................0x0000000f
|----EBX....................................0x756e6547
|----ECX....................................0x6c65746e
|----EDX....................................0x49656e69
\==+CPU ID id1 :
|----EAX....................................0x000306f2
|----EBX....................................0x00100800
|----ECX....................................0x77fefbff
|----EDX....................................0xbfebfbff