Worker nodes fail to migrate when the host enters maintenance mode - Unable to automatically migrate from the host
search cancel

Worker nodes fail to migrate when the host enters maintenance mode - Unable to automatically migrate from the host

book

Article ID: 403897

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

Worker nodes in VKS Cluster are unable to migrate automatically when the host enters maintenance mode, despite DRS being set to fully automated.

It fails with below error: "Unable to automatically migrate from the host"

Environment

vSphere with Tanzu

vSphere Kubernetes Service

 

 

Cause

Based on the vCenter's /var/log/vmware/vpxd.log:

YYYY-MM-DD info vpxd[#####] [Originator@##### sub=##### item=##### opID=#####] Vm [vim.VirtualMachine:<Workernode name>,#####] failed constraint check false on host [vim.HostSystem:#####,#####] with <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:vim25" versionId="#####.#####.#####"><fault xsi:type="NumVirtualCpusExceedsLimit"><maxSupportedVcpus>#####</maxSupportedVcpus></fault><localizedMessage></localizedMessage></obj>
  • The issue occurs because it violates the DRS maximum vCPUs per cluster.
  • CPU overcommitment ratio is configured at cluster level under DRS. This is an option to enforce a maximum vCPU:pCPU ratios in the cluster. Once the cluster reaches this defined value, no additional VMs will be allowed to power on/migrate.

Resolution

Disable the CPU over-commitment (or) increase the limit to accommodate more vCPUs per cluster:

Navigate to Cluster > Configure > DRS > Edit cluster settings > Additional options > Uncheck CPU overcommitment ratio or modify the ratio.

Note:

Maximum vCPUs per DRS cluster is calculated as below: Total pCPU in the cluster x (CPU over-commitment /100)