A back-end failure between an SP and disk enclosures on EMC AX-series array may result in ESX 4.x hosts using the Native Multipathing Plugin to see only Standby paths
search cancel

A back-end failure between an SP and disk enclosures on EMC AX-series array may result in ESX 4.x hosts using the Native Multipathing Plugin to see only Standby paths

book

Article ID: 344216

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
A back-end failure in an AX-series CLARiiON SAN array may cause these symptoms for LUNs backed by the affected disk enclosures:
  • Loss of availability to those LUNs on VMware ESX 4.x hosts until the LCC is replaced.
  • The Path-Selection Plugin (PSP) is unable to select a path to failover to, as all paths are reported as standby instead of active:

    nmp_SelectPathAndIssueCommand: PSP selected path "vmhba2:C0:T2:L10" in a bad state (standby)on device "naa.600601601f70190016361881f3b4de11".


Environment

VMware ESX 4.1.x
VMware ESX 4.0.x
VMware ESXi 4.1.x Embedded
VMware vSphere ESXi 5.5
VMware ESXi 4.0.x Installable
VMware ESXi 4.0.x Embedded
VMware ESXi 4.1.x Installable

Resolution

When a back-end failure occurs between the Storage Processor (SP) and the disk enclosures, the LUNs backed by the disk enclosure enter an unowned state. As such, both SPs report to the ESX 4.x host that the Logical Unit are in a NOT READY state.
These are the possible scenarios for a LUN to become "unowned" from an SP:

  • LCC (Link Control Card) failure between storage processor and disk enclosure
  • If an additional drive failure occurs to a RAID group already experiencing a drive failure

To workaround this issue, perform one of these options:
  • When a back-end failure occurs on the array, manually trespass the LUNs affected to the peer SP.
  • Install and use EMC Powerpath VE on the ESX 4.x hosts.
  • Use a RAID configuration where both mirrored pairs are not in the same disk enclosure, such as RAID 1+0. This ensures that the peer SP is still able to access the LUN in the event of a back-end failure.

For more information please see EMC Primus article emc253491.