Determining why an ESX/ESXi host does not respond to user interaction at the console
search cancel

Determining why an ESX/ESXi host does not respond to user interaction at the console

book

Article ID: 341047

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • An ESX/ESXi host is not reachable via the network with the vSphere Client.
  • An ESX/ESXi host is not reachable via the network with ping.
  • Virtual machines running on the ESX host are not reachable via the network.
  • vCenter Server reports the host as Not Responding.
  • The ESX/ESXi host does not respond to local commands or input at the console.
  • Pressing Alt + F12 at the console does not switch to the VMkernel log display.


Environment

VMware ESX 4.0.x
VMware ESX 4.1.x
VMware ESXi 4.0.x Embedded
VMware ESX Server 3.5.x
VMware ESX Server 3.0.x
VMware vSphere ESXi 5.5
VMware ESXi 4.0.x Installable
VMware vSphere ESXi 5.1
VMware ESXi 4.1.x Embedded
VMware ESXi 3.5.x Embedded
VMware vSphere ESXi 5.0
VMware ESXi 3.5.x Installable
VMware ESXi 4.1.x Installable

Resolution

A number of factors can cause an ESX host to become unresponsive. For example:

  • Defective or unresponsive hardware
  • An operational busy loop in the VMkernel, driver module, or service console
  • A component holding a lock needed by other components
  • A process that is consuming a high amount of resources

Troubleshooting this type of issue after it has occurred is difficult because you cannot interact with the ESX/ESXi host while it is in this state.

Note: Many external influences may yield similar symptoms but have very different underlying issues. For example, a network outage can result in a situation where an ESX/ESXi host and all running virtual machines become unresponsive, console authentication using remote directory services fails, and remote BMC management fails.

These limitations further complicate troubleshooting:

  • If the issue has only occurred once, analysis is limited to the logs generated prior to the single occurrence.
  • If the issue has only occurred once, you cannot identify patterns between multiple occurrences.
  • The logs generated by a single event may not be conclusive, and determining the root cause may not be possible.

If an ESX/ESXi host is currently in an unresponsive state, gather this information:

  1. Press the NumLock key on your keyboard and observe if the NumLock light state changes. A successful light state change indicates that the BIOS is responsive.
  2. Check if there is any active disk or network traffic using status lights or other hardware monitoring on the disk drive array, network interface cards or upstream switches. Active egress traffic indicates that the ESX/ESXi host is still functioning.
  3. VMware HA monitors ESX/ESXi host availability in part based on response to ICMP (ping) network traffic. If the ESX/ESXi host is a member of an HA cluster, check the logs on other cluster members to determine when or if they lost access to this host. For more information, see Troubleshooting VMware High Availability (HA) in VMware vSphere 4.x (1001596) or Troubleshooting VMware High Availability (HA) issues in VMware vCenter Server 5.x and 6.0 (2004429).
  4. Trigger an NMI at the hardware level and observe how ESX/ESXi responds. For more information, see Using hardware NMI facilities to troubleshoot unresponsive hosts (1014767). If a purple diagnostic screen occurs after triggering the NMI, take a screenshot.
  5. Attempt to interact with the server via a baseboard management controller (BMC) interface, such as ILO, DRAC or RSA. If aspects of this interface other than the console are also unresponsive, it indicates that the issue is hardware related.
  6. Reboot the ESX/ESXi host.
  7. Collect diagnostic information from the host for further analysis. For more information, see Collecting diagnostic information for VMware products (1008524).

If your issue is reproducible or occurs regularly, follow these steps to collect more data:

  1. Set up serial-line logging to collect log information that may not be logged normally during this condition. For more information, see Enabling serial-line logging for an ESXi/ESXi host (1003900).
  2. Setup top and esxtop in batch mode to collect performance data on the server leading up to the event. For more information, see Using performance collection tools to gather data for fault analysis (1006797).
  3. Configure your system to fail with a purple screen error after receiving an NMI generated manually from the hardware. For more information on changing how an ESX or ESXi host reacts to an NMI, see Using hardware NMI facilities to troubleshoot unresponsive hosts (1014767). For more information on triggering an NMI at the hardware level, contact your hardware vendor.

    Note: If the ESX host does not respond after generating a hardware NMI then the issue is likely due to unresponsive hardware. Contact your hardware vendor for further assistance in troubleshooting this issue.
  4. Collect the logs, serial logs, and performance data for further analysis. For more information, see Collecting diagnostic information for VMware products (1008524).
  5. Ask yourself these questions. The answers may help determine the cause of the issue. Depending on the answers, you may need to investigate your environment further for the root cause.
    • How many times has the ESX host experienced this condition?
    • What were the exact times and dates that the host became unresponsive?
    • Have any other hosts experienced this issue?
    • What else was happening in your environment at the time of the events?
    • Is there a pattern to the times when the host becomes unresponsive?
    • Are there any regularly scheduled jobs running when the host becomes unresponsive?

If you need help analyzing the logs and data, contact VMware Technical Support. For more information, see How to Submit a Support Request.


Additional Information

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box

For information on release ESXi 5.5 Please see the documentation center:

Troubleshooting VMware High Availability (HA) in VMware vSphere 4.x
Enabling serial-line logging for an ESXi/ESXi host
Using performance collection tools to gather data for fault analysis
Collecting diagnostic information for VMware products
Using hardware NMI facilities to troubleshoot unresponsive hosts
Troubleshooting VMware High Availability (HA) issues in VMware vCenter Server 5.x and 6.0
确定 ESX/ESXi 主机在控制台上不响应用户交互的原因
コンソールのユーザー操作に ESX/ESXi ホストが応答しない原因を特定する
Collecting diagnostic information for VMware ESX/ESXi