ESX Server virtual machines stop responding due to shared storage connectivity issues
search cancel

ESX Server virtual machines stop responding due to shared storage connectivity issues

book

Article ID: 306508

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article addresses one cause of ESX Server virtual machines that appear to halt or stop responding by assisting in identifying shared storage intermittent connectivity issues that cause the virtual machines to disconnect, halt and stop responding until the shared storage connectivity issue is resolved.

Symptoms:

  • ESX Server virtual machine appears to be halted or not responding during normal use or during VMotion.
  • The virtual machine may appear to halt or stop responding during boot.



Environment

VMware ESX 8.0.x
VMware ESX 7.0.x
VMware ESXi 3.5.x Embedded
VMware ESX 4.0.x
VMware ESXi 4.1.x Embedded
VMware ESX Server 3.5.x
VMware ESXi 4.0.x Embedded
VMware ESX Server 2.5.x
VMware ESX 4.1.x
VMware ESXi 3.5.x Installable
VMware ESX Server 3.0.x
VMware ESXi 4.1.x Installable
VMware ESXi 4.0.x Installable

Resolution

Identifying correct LUN pathing settings

The first step in identifying shared storage connectivity issues is to make sure your SAN and ESX Server are configured to work properly with each other. For instance, note if your SAN is an Active / Active or Active / Passive storage array. An Active / Active storage array will use a path policy of "Fixed" and an Active / Passive storage array will use a path policy of "Most Recently Used (MRU )". Additionally make sure to use the correct "Host Mode Type" on your shared storage (SAN) for LUNs presented to your ESX hosts.
 
For additional information on path policies in ESX Server see: "Obtaining LUN pathing information for ESX Server 3" (1003973)
 
Setting an incorrect storage path policy for your SAN model, may cause "path thrashing" which in turn may cause your shared storage devices to disconnect from your ESX Server hosts.
For information on whether your certified storage device is an "active / active" device that requires a "fixed " path policy or an "active / passive" device that requires an "mru" path policy, find your certified and supported storage device in VMware's online list of supported and certified Storage / SAN devices for your version of ESX Server:
 
 

Defining your Host Mode Type

Using the wrong "Host Mode Type" for LUNs presented to ESX Server may also cause shared storage disconnects. Consult with your storage vendor for the specific "Host Mode Type" you need to use on your storage device, so that the LUNs you present to ESX Server version 2.5.x and 3.x systems function properly.

Using the VMkernel error log to diagnose storage issues

Additionally you may login to your ESX Server service console as root and check /var/log/vmkernel log file for entries similar to:
 
Feb 10 13:41:16 esx02 vmkernel: 93:07:30:44.339 cpu14)WARNING: SCSI: 5663: vmhba1:0:30:1 status = 2/0 0x6 0x29 0x0
 
The hex values represents SCSI Command Descriptor Block (CDB) error codes comprised of Sense Key, Sense Code, and Extended Sense codes.
The above error message translates to:

Device Check Condition
Host no errors
ABORTED COMMAND
COMMANDS CLEARED BY ANOTHER INITIATOR
 
Additional error messages that indicate storage connectivity problems are:
 
Device Check Condition
Host no errors
UNIT ATTENTION
POWER ON RESET or BUS DEVICE RESET OCCURRED

Error messages like those listed above appearing in the ESX Server's /var/log/vmkernel log file indicate the shared storage device has encountered problems that caused it to disconnect from ESX Server. Consequently the shared storage connectivity failure causes virtual machines to disconnect and stop responding until shared storage connectivity is restored. Review your shared storage device log files for any indication of failures and contact your storage vendor for additional assistance.
For additional information regarding SCSI Sense Code interpretation, see How to interpret SCSI events (289902) .