Path fail over in Cisco UCS box takes exactly 10 seconds
search cancel

Path fail over in Cisco UCS box takes exactly 10 seconds

book

Article ID: 308527

calendar_today

Updated On: 04-21-2025

Products

VMware VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • Path fail over in Cisco UCS box takes exactly 10 seconds.
  • I/O pauses exactly for 10 seconds and path fail over happens after 10 seconds.
  • In the /var/log/vmkernel.log file, you see entries similar to this:

    2017-01-11T15:34:39.787Z cpu0:33589)NMP: nmp_ThrottleLogForDevice:3231: last error status from device naa.####097000019700022853303030#### repeated 5120 times
    2017-01-11T15:34:39.787Z cpu0:32899)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not select path for device "naa.####097000019700022853303030####".
    2017-01-11T15:34:39.787Z cpu0:32899)ScsiDevice: 7226: No Handlers registered! (naa.####097000019700022853303030####)!
    2017-01-11T15:34:39.787Z cpu0:32899)ScsiDevice: 4562: Device state of naa.####097000019700022853303030#### set to APD_START; token num:1
    2017-01-11T15:34:39.787Z cpu0:32899)StorageApdHandler: 1204: APD start for 0x43034c675850 [naa.####097000019700022853303030####]
    2017-01-11T15:34:39.787Z cpu20:33125)StorageApdHandler: 421: APD start event for 0x43034c675850 [naa.####097000019700022853303030####]
    2017-01-11T15:34:39.787Z cpu20:33125)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.####097000019700022853303030####] has entered the All Paths Down state.
    2017-01-11T15:35:01.890Z cpu10:33423)qlnativefc: vmhba3(84:0.0): fcport 5000097368039009 (targetId = 0) ONLINE
    2017-01-11T15:35:01.892Z cpu34:33428)qlnativefc: vmhba4(84:0.1): fcport 5000097368039008 (targetId = 0) ONLINE
    2017-01-11T15:35:02.796Z cpu29:33424)ScsiScan: 836: Path vmhba3:C0:T0:L0 supports REPORT LUNS 0x11
    2017-01-11T15:35:02.797Z cpu2:33429)ScsiScan: 836: Path vmhba4:C0:T0:L0 supports REPORT LUNS 0x11
    2017-01-11T15:35:02.802Z cpu7:32899)ScsiDevice: 4584: Setting Device naa.####097000019700022853303030#### state back to 0x2
    2017-01-11T15:35:02.802Z cpu7:32899)ScsiDevice: 7226: No Handlers registered! (naa.####097000019700022853303030####)!
    2017-01-11T15:35:02.802Z cpu7:32899)ScsiDevice: 4605: Device naa.60000970000197000228533030303031 is Out of APD; token num:1
    2017-01-11T15:35:02.802Z cpu7:32899)StorageApdHandler: 1314: APD exit for 0x43034c675850 [naa.####097000019700022853303030####]
    2017-01-11T15:35:02.802Z cpu20:33125)StorageApdHandler: 509: APD exit event for 0x43034c675850 [naa.####097000019700022853303030####]
    2017-01-11T15:35:02.802Z cpu20:33125)StorageApdHandlerEv: 117: Device or filesystem with identifier [naa.####097000019700022853303030####] has exited the All Paths Down state.
    2017-01-11T15:35:03.787Z cpu44:33499)NMP: nmp_ResetDeviceLogThrottling:3349: last error status from device naa.####097000019700022853303030#### repeated 255 times

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

 

Environment

VMware vCenter Server Appliance 5.5.x
VMware vCenter Server 5.5.x
VMware Update Manager 6.0
VMware vSphere ESXi 5.5
VMware vCenter Server 6.5.x
VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.5
VMware vSphere Client 5.5
VMware vCenter Server 6.0.x
VMware Update Manager 5.5
VMware vCenter Server Appliance 6.0.x
VMware vCenter Server Appliance 6.5.x
VMware Update Manager 6.5

Cause

This issue occurs when the storage adapter takes 10 seconds to send the 0x1 command to the ESXi host which results in 10 seconds delay in path fail over.

Note: ESXi host fail over the path as soon as it receives Test Unit Ready (TUR) command 0x1 from the storage adapter.

 

Resolution

To resolve this issue, set the Link Down Timeout field value to 2,000ms under the fiber channel adapter policy in the Cisco UCSM. For more information on updating the fiber channel adapter policy, see Configuring Fibre Channel Adapter Policies.

Note: I/O pause would be 2 seconds and 0x1 command will be triggered in 2 seconds once the Link Down Timeout field value is set to 2,000ms.