ESXi Host Becomes Unresponsive with "Failed to Complete Filtering of Stickybit Files" Error Due to Audit Logging Configuration Issues
search cancel

ESXi Host Becomes Unresponsive with "Failed to Complete Filtering of Stickybit Files" Error Due to Audit Logging Configuration Issues

book

Article ID: 391047

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi hosts may become unresponsive or display limited functionality with the following symptoms:

  • Host connection state appears red in vCenter (Not Responding)
  • SSH connection attempts result in "Connection refused" errors even when SSH is enabled in the DCUI
  • Errors in logs about "Failed to complete filtering of stickybit files"
  • Command line interface becomes unresponsive or inaccessible
  • Console displays crond messages and does not accept input
  • Unable to generate ESXi host log bundles
  • vmsyslogd daemon may be stopped

Environment

  • VMware ESXi 7.0.x or newer

Cause

This issue has multiple related causes, all stemming from audit logging configuration:

  1. Enabling audit logging without properly configuring a scratch location for the audit logs
  2. Changing the scratch location on a host that already has audit record storage enabled
  3. Manually creating the audit record storage directory before configuring it (the directory must be created by the system)

When any of these conditions occur:

  • The syslog daemon may crash when it cannot find or access the audit directory
  • The hostd process may stop logging normally
  • The host may become unresponsive to management operations

In the error logs, there may be messages similar to:

vmsyslog.loggers.audit: ERROR] Files are missing from the audit record storage directory

Or in vCenter logs:

Failed to complete filtering of stickybit files

Resolution

To resolve this issue, follow these steps:

  1. Access the ESXi host through any available method:

    • Direct console (DCUI)
    • SSH (if available)
    • Remote console through iLO/iDRAC/IPMI
  2. Disable and then re-enable the local audit record storage:

    esxcli system auditrecords local disable
    esxcli system auditrecords local enable
    
  3. Check if the vmsyslogd service is running:

    /etc/init.d/vmsyslogd status
    
  4. If vmsyslogd is not running, start it:

    /etc/init.d/vmsyslogd start
    
  5. If host is still not responding, perform a complete power drain:

    • Power off the host completely (not just reboot)
    • Leave the host powered off for 5 minutes
    • Power the host back on
  6. After the host is back online, properly configure audit logging with the following steps: a. Ensure no audit logging directory exists yet at the target location (verify the folder doesn't exist) b. Configure the audit record storage location:

    esxcli system auditrecords local set --directory /vmfs/volumes/datastore_name/audit/hostname
    

    Note: Do not manually create this directory - let the system create it c. Enable audit logging:

    esxcli system auditrecords local enable
    

    d. Verify vmsyslogd is running:

    /etc/init.d/vmsyslogd status
    

    e. Check that logging is functioning correctly

For STIG hardening compliance, ensure that audit logs are stored on persistent storage, but follow the steps above carefully to avoid this issue.

Additional Information

To prevent this issue from occurring:

  • Always configure audit logging after scratch configuration/host profiles are applied
  • Never manually create the audit record storage directory
  • When changing scratch locations, first disable audit logging, then change the scratch location and reboot, then re-enable audit logging

For more information on related topics, see:

  • KB 376958: Configuring Scratch configuration and Syslog.global.auditRecord.storageEnable causes the ESXi host to go unresponsive or hostd not logging
  • KB 370221: Configuring Audit Logging for STiG hardening of ESXi hosts fails with the error: "invalid file location"