ESXi host is not responding while VMs are running following a recovery from a storage outage
search cancel

ESXi host is not responding while VMs are running following a recovery from a storage outage

book

Article ID: 391818

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi host is not responding in vCenter, but the virtual machines (VMs) hosted on it continue to run.
  • "df -h" hang in ESXi host command line.
  • ESXi logs (e.g. /var/run/log/vmkernel.log) cannot be viewed in ESXi host command line.
  • After restarting hostd/vpxa services, ESXi host shows error in graphical user interface:
    503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x################] _serverNamespace = / action = Allow _port = 8309)
  • Access the ESXi host through console and observe below error messages by using the Alt-F12.
    cpu46:2097460)WARNING: NMP: nmpCompleteRetryForPath:364: Retry cmd 0x2a (0x############) to dev "naa.################################" failed on path ...
  • Unable to stop hostd.
    watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist
    watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd
    sh: can't kill pid #######: No such process 

Environment

  • VMware vSphere ESXi

Cause

The management service, hostd is unable to complete I/O and is in a hung state.

Resolution

To resolve the issue, identify and troubleshoot underlying storage issues.

To work around the issue,

  • Consider un-presenting the suspected LUN (Logical Unit Number) from the ESXi host or cluster. Refer to How to detach a LUN device from ESXi hosts for details.
  • If the LUN cannot be unpresented from the ESXi host, 
    • Manually shutdown the VMs using RDP/SSH.
    • Shutdown and reboot the ESXi host