SNMP traps indicating hardware issues are received from an ESXi host, but no physical hardware failure is found
search cancel

SNMP traps indicating hardware issues are received from an ESXi host, but no physical hardware failure is found

book

Article ID: 424615

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • An SNMP trap is received from an ESXi host indicating a hardware-level issue. 
    Examples:
    Assert + Power Supply Failure status
    Assert + Power Supply Power Supply AC lost
  • No physical hardware abnormalities are found.
  • Executing the following command to review the IPMI System Event Log (SEL) confirms that similar content was logged in the past.
    localcli hardware ipmi sel list -p -i -n all

Environment

ESXi 8.0

Cause

A known issue exists where the snmpd service on an ESXi host, upon restarting, may re-send previously recorded and already notified SEL entries as new SNMP traps.

If you receive SNMP traps similar to those described in the Symptoms section but find no physical hardware issues, it may be a false positive caused by this snmpd behavior.

Resolution

Broadcom engineering is aware of this issue and is working on a fix for a future release.

To verify if the notified SNMP trap is a false positive, follow the steps below:

  • Check for snmpd restart at the time of the trap

Verify if snmpd stopped or started around the time the trap was detected.

    1. Log in to the ESXi host as the root user via SSH.
    2. Review /var/run/log/syslog.log and check if the following messages were recorded within 1-2 minutes prior to the trap detection:
      snmpd: SNMP Research EMANATE/Lite Agent Version ##.#.#.#
      snmpd: Copyright 1989-2011 SNMP Research, Inc.
      LoadV3Users: loaded 1 users from config file
      LoadV3Targets: loaded 1 notification targets
      send_env_notifications: sent [x] of ### SEL entries as notifications, 0 already sent
      Note: The value [x] will be a non-zero number.
  • Verify the timestamp of the notified SEL entry

If an snmpd restart is confirmed, verify that the timestamp of the notified SEL entry points to a past date/time.

    1. Identify the following string within the received SNMP notification: 6876.4.20.3.1.4.
    2. This string is followed by information in the format of [Record ID]:[Message]. Use this to identify the Record ID and Message of the SEL entry associated with the trap.
    3. Connect to the ESXi host via SSH and run the following command to list the recorded SEL entries: 
      localcli hardware ipmi sel list -p -i -n all
    4. Locate the SEL entry in the list that matches the Record ID and Message identified in step 2, and check its timestamp.
    5. If the timestamp of the SEL entry notified by SNMP points to a past date and time (prior to the current event), the SNMP trap can be confirmed as a false positive caused by this issue.