All the regular virtual machines migrate out of the host in a DRS enabled cluster
search cancel

All the regular virtual machines migrate out of the host in a DRS enabled cluster

book

Article ID: 321981

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Agent virtual machine has been deployed via EAM.
  • The Agent VM is powered off or unhealthy. 
  • DRS immediately migrates the virtual machine out of the ESXi host, even if we try to manually relocate VMs to the host.
  • In the vCenter /var/log/vmware/vpxd log, it may contain below similar messages:

2023-02-27T16:27:24.160+08:00 info vpxd[26656] [Originator@6876 sub=cdrsPlmt opID=CdrsLoadBalancer-3ab50c2a] Vm [vim.VirtualMachine:vm-####,VMNAME] failed constraint check false on host [vim.HostSystem:host-####,ESXIHOSTNAME] with <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:vim25" versionId="7.0.1.1" xsi:type="LocalizedMethodFault"><fault xsi:type="InsufficientAgentVmsDeployed"><hostName>ESXIHOSTNAME</hostName><requiredNumAgentVms>1</requiredNumAgentVms><currentNumAgentVms>0</currentNumAgentVms></fault><localizedMessage></localizedMessage></obj>

Environment

VMware vCenter Server 7.0.x
VMware NSX

Cause

From these error messages, we can see that a regular virtual machine violated the DRS constraint check on the host. The DRS advanced setting contains configuration of requiredNumAgentVms = 1, but the host currentNumAgentVms = 0. So the reason for the violation is InsufficientAgentVmsDeployed.

Resolution

Follow the steps below to check the Agent virtual machine on the ESXi host:

1) Check the Agent VM if deployed. SSH to the ESXi host, run the command:

# /opt/vmware/fdm/fdm/prettyPrint.sh clusterconfig

Sample of output:
....
<agentVmList>
<vmCfgFilePath>/vmfs/volumes/vsan:5############f44-e##########85/e63#####-####-####-####-######32350/AGENTVMNAME</vmCfgFilePath>
<hostId>host-######</hostId>
</agentVmList>


The hostId is the ESXi host MOID. 
The above message indicates that the host has an Agent VM. Otherwise, it will not. 

2) From the output of the above command, if the Agent VM has been deployed, then check if it is powered on or off. If it has not been deployed, try to deploy it again to see whether the deployment is successful. 

3) If the Agent VM is powered on or not deployed, check the vCenter EAM service log(/var/log/vmware/eam/eam.log).

In the below sample case, it was powered on successfully, but its status never returns to GREEN:

2023-02-24T08:13:17.315Z |  INFO | vim-inv-update | VirtualMachinePropertyChangeHandler.java | 243 | VM: vm-######power state set to poweredOn
2023-02-24T08:13:17.349Z |  INFO | host-agent-1 | AgentWorkflowListener.java | 135 | HostAgent(ID: 'Agent:e7c#####-####-####-####-##########fb: null') is waiting for a hook, provisioned: false, poweredOn: true, prePowerOn: false, keeping it yellow until hooks are processed.


The hooks were not called, which caused an underlying cluster to be in a locked state. The Agent virtual machine can not process the hook to change its status to green. Means that the Agent virtual machine was not ready, even though deployed yet. 

From the EAM log(/var/log/vmware/eam/eam.log), we can also see that the agent virtual machine was deployed from NSX.

4)Then, check the NSX log /var/log/cm-inventory/cm-inventory.log. It may contain below messages:

2023-02-27T10:14:30.304Z ERROR http-nio-127.0.0.1-7443-exec-1 VcCommunicator 4418 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP40500" level="ERROR" subcomp="cm-inventory"] Unable to create connection to cm with id: 5f######4-c####-####-a821-09#########0
com.vmware.vim.vmomi.client.exception.SslException: javax.net.ssl.SSLHandshakeException: com.vmware.nsx.management.security.ThumbprintMismatchException: 557#####################################################


This indicates that NSX can not communicate with the vCenter server due to a mismatched SSL thumbprint, which causes the hook could not be released.
Manually update the vCenter credential in the NSX management portal that will to force get the correct SSL thumbprint of the vCenter server.

To restore the Compute Manager connection:

    1. Login to NSX manager, navigate to System Fabric > Compute Manager
    2. Select Compute Manager and Edit
    3. Enter the correct thumbprint in "SHA-256 thumbprint" and Save