All the regular virtual machines migrate out of the host in a DRS enabled cluster
search cancel

All the regular virtual machines migrate out of the host in a DRS enabled cluster

book

Article ID: 321981

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

  • Agent virtual machine has been deployed via EAM.
  • The Agent VM is powered off or unhealth. 
  • DRS immediately migrates virtual machine out of the ESXi host even if manually relocate VMs to the host.
  • In the vCenter vpxd log, it may contain below similar messages:
2023-02-27T16:27:24.160+08:00 info vpxd[26656] [Originator@6876 sub=cdrsPlmt opID=CdrsLoadBalancer-3ab50c2a] Vm [vim.VirtualMachine:vm-####,VMNAME] failed constraint check false on host [vim.HostSystem:host-####,ESXIHOSTNAME] with <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:vim25" versionId="7.0.1.1" xsi:type="LocalizedMethodFault"><fault xsi:type="InsufficientAgentVmsDeployed"><hostName>ESXIHOSTNAME</hostName><requiredNumAgentVms>1</requiredNumAgentVms><currentNumAgentVms>0</currentNumAgentVms></fault><localizedMessage></localizedMessage></obj>

From this messages, we can see that regular virtual machine violated the DRS constraint check on the host. The DRS advanced setting contains configuration of requiredNumAgentVms = 1, but the host currentNumAgentVms = 0. So the reason of violation is InsufficientAgentVmsDeployed



Environment

VMware vCenter Server 7.0.x

Resolution

Follow the steps to check the Agent virtual machine on the ESXi host:
1) Check the Agent VM if deployed. SSH to the ESXi host, run the command:

# /opt/vmware/fdm/fdm/prettyPrint.sh clusterconfig

Sample of output:
....
<agentVmList>
<vmCfgFilePath>/vmfs/volumes/vsan:5############f44-e##########85/e63#####-####-####-####-######32350/AGENTVMNAME</vmCfgFilePath>
<hostId>host-######</hostId>
</agentVmList>


The hostId is the ESXi host MOID. 
The above message indicates that the host has an Agent VM. Otherwise it will not. 

2) From the output of the above command, if the Agent VM has been deployed, then check it is powered on or off. If it has not been deployed, try to deploy it again to see whether deployment is successful. 
3) If the Agent VM is powered on or not deployed, check the vCenter EAM service log.

In the below sample case, it was powered on successfully, but it's status never returns to GREEN:

2023-02-24T08:13:17.315Z |  INFO | vim-inv-update | VirtualMachinePropertyChangeHandler.java | 243 | VM: vm-######power state set to poweredOn
2023-02-24T08:13:17.349Z |  INFO | host-agent-1 | AgentWorkflowListener.java | 135 | HostAgent(ID: 'Agent:e7c#####-####-####-####-##########fb:null') is waiting for a hook, provisioned: false, poweredOn: true, prePowerOn: false, keeping it yellow until hooks are processed.


The hooks were not called that caused underlying cluster in a locked state. The Agent virtual machine can not process the hook to change its status to green. Means that the Agent virtual machine was not ready even through deployed yet. 

From the EAM log can also see that the agent virtual machine was deployed from NSX.
Then need to check NSX log cm-inventory.log, it may contains below messages:

2023-02-27T10:14:30.304Z ERROR http-nio-127.0.0.1-7443-exec-1 VcCommunicator 4418 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP40500" level="ERROR" subcomp="cm-inventory"] Unable to create connection to cm with id: 5ff472e4-caec-49cc-a821-0998588383a0
com.vmware.vim.vmomi.client.exception.SslException: javax.net.ssl.SSLHandshakeException: com.vmware.nsx.management.security.ThumbprintMismatchException: 557878ba914a67892bd8c9011b26ce06be3716993d7b4ea3732ee1142ea6f72c


This indicates that NSX can not communicate with vCenter server due to mismatch SSL thumbprint that causes the hook could not be released.
Manually update the vCenter credential in NSX management portal that will force to get the correct SSL thumbprint of vCenter server.