Unable to place ESXi host into Maintenance Mode in a vSAN cluster
search cancel

Unable to place ESXi host into Maintenance Mode in a vSAN cluster

book

Article ID: 432485

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • You are unable to place an ESXi host into maintenance mode within a vSAN cluster.
  • The maintenance mode task hangs indefinitely or fails to complete.
  • Checking vSAN Skyline health you see the host is network partitioned.
  • Physical server review indicates that the host may have experienced a hardware (NIC) issue causing it to become partitioned or isolated from the rest of the cluster, preventing normal cluster operations and Virtual Machine (VM) migrations.

Environment

vSAN All Versions

Cause

The issue is caused by stale or hung Virtual Machine processes residing on the partitioned host.

When a host experiences a hardware failure and partitions from the vSAN cluster, the VMs running on that host can become unresponsive. These VMs fail to migrate or power down gracefully, leaving behind hung vmx processes. The ESXi host cannot complete the transition into maintenance mode as long as these unresponsive processes remain active and hold system locks.

Resolution

To resolve this issue, you must identify and manually kill the stale VM processes via the ESXi command line. SSH to the ESXi host with the root user. 

  1. Get a list of running virtual machines, identified by World ID, UUID, Display Name, and path to the .vmx configuration file:
    • esxcli vm process list 
  2. Power off the virtual machine from the list by running one of these commands:
    • esxcli vm process kill -t soft -w WorldID
    • esxcli vm process kill -t hard -w WorldID
    • esxcli vm process kill -t force -w WorldID
  3. Repeat Step 2 and validate that the virtual machine is no longer running.
  4. Attempt to enter maintenance mode again