ESXi hosts experiences All Paths Down events on USB based SD Cards while using the vmkusb driver
search cancel

ESXi hosts experiences All Paths Down events on USB based SD Cards while using the vmkusb driver

book

Article ID: 323075

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The purpose of this article is to highlight the tolerance level of the vmkusb driver to recover from brief SD card disconnections witnessed on the HW level.

Symptoms:
ESXi 6.x/7.x hosts experience random All Paths Down events (APDs) on their locally attached SD cards using the vmkusb based USB driver.

As seen in an example from vobd.log here:

2020-05-22T15:55:02.568Z: [APDCorrelator] 1960326656825us: [vob.storage.apd.start] Device or filesystem with identifier [mpx.vmhba32:C0:T0:L0] has entered the All Paths Down state.
2020-05-22T15:55:02.568Z: [scsiCorrelator] 1960326656798us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba32:C0:T0:L0 changed state from on
2020-05-22T15:55:02.568Z: [APDCorrelator] 1960324330689us: [esx.problem.storage.apd.start] Device or filesystem with identifier [mpx.vmhba32:C0:T0:L0] has entered the All Paths Down state.
2020-05-22T15:55:02.571Z: [scsiCorrelator] 1960324334135us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device mpx.vmhba32:C0:T0:L0. Path vmhba32:C0:T0:L0 is down. Affected datastores: "".
2020-05-22T15:55:10.990Z: [netCorrelator] 1960335078949us: [vob.net.vmnic.linkstate.up] vmnic vusb0 linkstate up
2020-05-22T15:55:11.000Z: [netCorrelator] 1960332762933us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vusb0 linkstate is up
2020-05-22T15:57:22.568Z: [APDCorrelator] 1960466657332us: [vob.storage.apd.timeout] Device or filesystem with identifier [mpx.vmhba32:C0:T0:L0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2020-05-22T15:57:22.568Z: [APDCorrelator] 1960464330957us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [mpx.vmhba32:C0:T0:L0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.



As from vmkernel.log here:

2020-05-22T15:55:02.567Z cpu0:2097360)WARNING: NMP: nmpUnclaimPath:1561: NMP device "mpx.vmhba32:C0:T0:L0" quiesce state change failed: Busy
2020-05-22T15:55:02.567Z cpu0:2097360)WARNING: ScsiPath: 6506: Path vmhba32:C0:T0:L0 is being removed
2020-05-22T15:55:02.567Z cpu0:2097360)WARNING: ScsiPath: 6820: Failed to issue command 0x0 (cmdSN 0x0) on path vmhba32:C0:T0:L0: No connection


The ESXi host will remain connected inside vCenter, however, the ESXi operating system will be in a read only state until it is rebooted.

This issue does not re-occur if you move back to the legacy USB driver as outlined here: https://kb.vmware.com/s/article/2147650
*Note, the legacy USB/XHCI drivers are no longer available in ESXi 7.0 and beyond.

Environment

VMware ESXi 6.7.x
VMware ESXi 6.5.x
VMware vSphere ESXi 7.0.x

Cause

In ESXi 6.5, a new USB driver was introduced to replace the legacy linux based USB and XHCI drivers to be used for communication with SCSI based SD cards seen through onboard USB based controllers.

While the cause of the events may vary and depend on the HW layer, small disconnections from the SD card via the USB based interface while using the new vmkusb driver can result in an APD event that will most likely render the ESXi OS install location to drop. This causes the ESXi system to then go into read only mode, however, it can still run and operate under most normal conditions within vCenter and still service virtual machine operations.

Further, the ESXi OS may recover from the APD event once the SD card reconnects if the SD card is not configured to have an active core dump partition on it. This, however, this is not a suggested workaround.

This issue is not seen to re-occur inside the Linux based legacy USB driver. This is believed to be due to the maturity of the legacy code and that the resiliency on retrying loss connections to the USB device is more flexible, thus it recovers without any subsequent events or impact.

Resolution

VMware cannot identify and resolve the hardware layer disconnects being observed, however, VMware is aware of the resiliency behavior of the vmkusb driver and this is being addressed in a future vmkusb driver not yet released.

Please work with your server/SD card hardware vendor to address concerns of SD card disconnects.

For assistance with a workaround vmkusb driver, please open a support ticket and reference this KB article.

Additional Information

Impact/Risks:
As the SD cards are used as the install medium for the ESXi OS, the All Paths Down event can cause the ESXi OS to go into a read only state for all OS files and folders based on the SD card. A reboot is required to return the system to a read/write state. This issue does not impact virtual machines or the ESXi host capabilities in vCenter, such as SVMotion, VMotion, and HA events, etc.