ESXi Host Becomes Unresponsive During Nutanix LCM Pre-Upgrade Checks
search cancel

ESXi Host Becomes Unresponsive During Nutanix LCM Pre-Upgrade Checks

book

Article ID: 414025

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article addresses an issue where an ESXi host becomes unresponsive or enters a “Not Responding” state while performing pre-upgrade checks through Nutanix Life Cycle Manager (LCM).
The issue is observed in environments where Nutanix LCM triggers multiple command executions simultaneously on the ESXi host, leading to resource contention and service failure.

You may encounter one or more of the following symptoms:

  • The ESXi host status changes to “Not Responding” in vCenter Server during the Nutanix LCM pre-upgrade process.
  • SSH access to the affected host becomes unresponsive or intermittent.
  • The hostd service stops responding, leading to disconnection from vCenter Server.
  • The following warnings or memory admission failures are recorded in /var/run/log/vmkernel and /var/run/log/vmkwarning:

vmkernel: cpu108:5398853)Admission failure in path: host/vim/vmvisor/ntnx:python.5398853:uw.5398853

vmkernel: cpu108:5398853)UserWorld 'python' 5398853 with cmdline 'python /get_one_time_password.py'vmkernel: cpu108:5398853)requires ### KB, asked ### KB from ntnx (1882) which has ### KB occupied and ### KB available.

vmkwarning: could not change group to <host/vim/vmvisor/ntnx>: Admission check failed for memory resource

Environment

VMware vSphere Esxi 8.x

Nutanix Life Cycle Manager (LCM) 

Cause

This issue occurs when multiple Nutanix LCM pre-upgrade commands (for example, esxcli and vim-cmd) are executed simultaneously.

These concurrent processes create high memory utilization within the Nutanix user group (ntnx), eventually depleting its allocated memory pool. When the memory limit is reached, the ESXi hostd service fails, causing the host to enter a Not Responding state.

In summary, the root cause is resource exhaustion within the Nutanix memory group during LCM operations, leading to service admission failures and hostd termination.

Resolution

To prevent the ESXi host from becoming unresponsive during Nutanix LCM pre-upgrade checks, perform one of the following solutions:

Option 1: Perform the ESXi Upgrade from vCenter Server

Run the ESXi upgrade directly from vCenter Server instead of Nutanix LCM.
This approach ensures better resource scheduling and avoids the resource contention seen during Nutanix-triggered upgrades.

Option 2: Increase Memory Allocation for the Nutanix Memory Group

1. Check the current memory allocation for the Nutanix memory group:

     localcli --plugin-dir /usr/lib/vmware/esxcli/int sched group getmemconfig --group-path /host/vim/vimuser/ntnx

2. Increase the memory allocation. As a starting point, you can double the existing allocation (for example, to 500 MB):

    localcli --plugin-dir /usr/lib/vmware/esxcli/int sched group setmemconfig --group-path /host/vim/vimuser/ntnx --max 500 --units mb

3. Restart the hostd service after modification:

    /etc/init.d/hostd restart

4. Re-run the Nutanix LCM pre-upgrade checks to confirm stability.