Host is in not responding stage due to hardware issue - Error: Slot/Connector Fault Status
search cancel

Host is in not responding stage due to hardware issue - Error: Slot/Connector Fault Status

book

Article ID: 392731

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • The host responds extremely slowly when accessed via PuTTY or the Direct Console User Interface (DCUI) and eventually freezes completely
  • DCUI functionality is limited, with system logs not accessible
  • System Event Logs (SEL) show multiple "Slot/Connector Fault Status" assertions
  • Management agents (hostd and vpxa) cannot be restarted successfully
  • A gap in system logging occurs during the outage period

Environment

VMware ESXi 7.0 or newer

Cause

The issue is caused by hardware failures related to the server's expansion slots or connectors, as evidenced by the "Slot/Connector Fault Status" assertions in the System Event Logs (SEL). These hardware failures can affect:

  • SD card boot media connected to the affected slot
  • Storage adapter connectivity, resulting in Permanent Device Loss (PDL) events
  • System stability and management functionality

The problem may be exacerbated by:

  • Outdated firmware on network adapters
  • Critically full datastores (95% capacity)
  • Outdated server BIOS

Resolution

During a planned maintenance window, follow these steps:

    1. Evacuate all Virtual Machines from the affected host to other healthy hosts

    2. Power off the affected host completely

    3. Contact your hardware vendor to:
      1. Physically reseat all adapter cards in the server
      2. Perform diagnostics on PCIe slots, expansion cards, and motherboard
      3. Replace faulty hardware components if identified

    4. Update the server BIOS to the latest available version

    5. Update firmware for network adapters showing "Needs update" status

    6. Power on the host and verify normal operation

If the issue re-occurs:

    1. Evacuate VMs to other ESXi hosts

    2. Before rebooting the affected host, follow the procedure in Using hardware NMI facilities to troubleshoot unresponsive hosts
      to initiate a host memory dump

    3. After reboot, generate and upload a new host support bundle that includes the memory dump file for analysis

Additional Information

  • Many storage arrays are configured to set storage to read-only once they go above 95% capacity to prevent data corruption

  • Persistent hardware "Slot/Connector Fault Status" assertions in System Event Logs are a strong indicator of hardware issues

  • Loss of logging (log gaps) during outages can make root cause analysis challenging