hostd crashes due to "Could not initialize AIO handles #########: No free handles"
search cancel

hostd crashes due to "Could not initialize AIO handles #########: No free handles"

book

Article ID: 408376

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 8.x VMware vSphere ESX 7.x

Issue/Introduction

When multiple virtual machines with vGPU attached are running concurrently on a single host, the following warning message is frequently logged in the vmkernel.log file:

“WARNING: FDS: ###: Could not initialize AIO handles ########: No free handles”

Under these conditions, the following symptoms have been observed:

  • Hostd process crashes
  • Hardware health alarms triggered
  • Failures when attempting to open device files
  • Virtual machine crashes resulting in Blue Screen of Death (BSOD)

Environment

VMware vSphere ESX 7.0
VMware vSphere ESX 8.0
VMware ESX 9.0

Cause

ESXi can allocate up to 32,768 AIO handles for FDS on a system. When running numerous virtual machines with vGPU attached on a single host, the available AIO handles may become exhausted, resulting in an inability to open device files due to the lack of free AIO handles.

Under these conditions, the hostd service may crash when attempting to open a device file, hardware sensors may fail to be monitored if ESXi services cannot access the IPMI device file, and other instability symptoms may be observed.

Resolution

This issue is resolved by updating to VMware vSphere ESX 8.0 Update3i and changing FDSNumAIOHandles kernel parameter to 65536.
After installing VMware vSphere ESX 8.0 Update3i, follow the steps below to change the FDSNumAIOHandles parameter.

Broadcom engineering team is working on a fix for VMware ESX 9.0.

 

Steps to change FDSNumAIOHandles:

  1. Log in to the ESXi host via ssh or the DCUI.

  2. Run the following command to set FDSNumAIOHandles to 65536.

      esxcli system settings kernel set -s FDSNumAIOHandles -v 65536

    Example:
    # esxcli system settings kernel set -s FDSNumAIOHandles -v 65536
    (No output is returned)



  3. Reboot the host


  4. Run the following command to verify that the new value has been applied.

      esxcli system settings kernel list -o FDSNumAIOHandles

    Example:

    # esxcli system settings kernel list -o FDSNumAIOHandles
    Name              Type    Configured  Runtime  Default  Description
    ----------------  ------  ----------  -------  -------  -----------

    FDSNumAIOHandles  uint32  65536       65536    32768    Number of AIO handles that we expect LibAIO at the FDS level to dole out. (Range: 1 - 65536)

    Note: Verify that the value in Runtime column is changed to 65536.

 

Workaround:
In VMware vSphere ESX 8.0 Update3h or earlier, or VMware ESX 9.0, the issue can be avoided by either powering off the VMs that are consuming a large number of AIO handles or migrating them to a host with lower consumption. The amount of AIO handle usage can be confirmed using the vmkvsitools in ESX shell as below. 

In the output of the following command line, VMs with large values in the first column are consuming a significant number of AIO handles. The second column indicates the process ID (VMX Cartel ID) of the VM.

vmkvsitools lsof | awk '$3=="CHAR" {print $0}' | grep vmgfx | awk '$2=="vmx" {print $1}' | sort | uniq -c | sort -k 1 -n

Example:
# vmkvsitools lsof | awk '$3=="CHAR" {print $0}' | grep vmgfx | awk '$2=="vmx" {print $1}' | sort | uniq -c | sort -k 1 -n
<Value1>  <VMX_Cartel_ID1>
<Value2>  <VMX_Cartel_ID2>
<Value3>  <VMX_Cartel_ID3>
...

The VMX Cartel ID of VM can be identified using  esxcli vm process list.

esxcli vm process list

Example:
# esxcli vm process list
<VM Name>
   World ID: ########
   Process ID: ###
   VMX Cartel ID: #######
   UUID: ## ## ## ## ## ## ## ##-## ## ## ## ## ## ## ##
   Display Name: ####-########-####-####-####-############
   Config File: ##########################################
...