3rd party Hyper Converged Infrastructure setups experience a soft lock up and goes unresponsive indefinitely
search cancel

3rd party Hyper Converged Infrastructure setups experience a soft lock up and goes unresponsive indefinitely

book

Article ID: 317892

calendar_today

Updated On:

Products

VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

This article provides steps to troubleshoot the issue of unresponsive Storage Controller VM’s in 3rd party Hyper converged infrastructure environment.

 

  • Controller VMs with direct I/O devices in third-party hyper converged infrastructure environments experience guest kernel soft lockups

 

Cause

A rare race condition between the interrupt virtualization and the VMkernel CPU scheduler in controller VMs with direct I/O devices in third-party hyper converged infrastructure environments might result in guest kernel soft lockups. The soft locks cause virtual machines in the entire cluster to lose NFS storage connectivity and I/O access. 

Resolution

This is a known issue in VMware ESXi 7.0.x

This issue is resolved in VMware ESXi 8.0U3 GA version.To download go to Download Broadcom products and software

 

To workaround this issue in ESXi 7.0.x version, implement the below steps

1. Open a console to the ESXi host. For more information, see Unable to connect to an ESXi host using Secure Shell.

2. Check the current value of the option using the esxcfg-advcfg command:

    For boot-time options in the VMkernel.boot.* namespace:

   esxcfg-advcfg --get-kernel vtdEnableIntrVirt

Note: From ESXi 7.0U2 and later, the default value of the above option is set as True

3. Set a new value for an option using the esxcfg-advcfg command:

  • esxcfg-advcfg --set-kernel "FALSE" vtdEnableIntrVirt
  • esxcfg-advcfg --get-kernel vtdEnableIntrVirt [This records the changes on the value.]
  • Reboot the server after the changes.
  • Once the server is rebooted, run the command  esxcfg-advcfg --get-kernel vtdEnableIntrVirt to verify if the value shows as FALSE.

 


To workaround this issue in ESXi 8.0.x version, implement the below steps

1. Open a console to the ESXi host. For more information, see Unable to connect to an ESXi host using Secure Shell.

2. Check the current value of the option using the esxcfg-advcfg command:

    For boot-time options in the VMkernel.boot.* namespace:

   esxcfg-advcfg --get-kernel iovEnablePostedIntr

3. Set a new value for an option using the esxcfg-advcfg command:

  • esxcfg-advcfg --set-kernel "FALSE" iovEnablePostedIntr
  • esxcfg-advcfg --get-kernel iovEnablePostedIntr [This records the changes on the value.]
  • Reboot the server after the changes.
  • Once the server is rebooted, run the command esxcfg-advcfg --get-kernel iovEnablePostedIntr to verify if the value shows as FALSE