SFCB crashes on ESXi when SEL records become too numerous
search cancel

SFCB crashes on ESXi when SEL records become too numerous

book

Article ID: 318527

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • sfcb-vmware_bas-zdump or sfcb-vmw_ipmi-zdump core files are found in /var/core
  • Running the following command shows a large number of Total Records and Maximum Records. (6000 records have been confirmed to cause this, but the number could be less)
[root@esx-lab:~] esxcli hardware ipmi sel get

IpmiSELConfig:
Enabled: true
Formatted-Raw: 00 01 a2 11 ff ff 4a ed 2a 30 60 ed b6 4e 01
Last Added: 2021-11-11T11:11:16
Last Cleared: 2021-06-06T06:06:01
Maximum Records: 10661
Overflow: false
Raw:
Sel-Clock: N/A
Total Records: 6566
Version: 0x51 (1.5)


Environment

VMware ESXi 6.7.x
VMware vSphere ESXi 6.7

Cause

This issue occurs when IPMI runs out of allocated memory while enumerating the large list of System Event Log (SEL) records. Hardware vendors have recently begun to increase the amount of SEL records for better logging of hardware events. These were normally set to 512 or 1024 in the past.

Resolution

This issue is resolved in vSphere ESXi 7.0 U3i (build number 20842708).
This issue is resolved in vSphere ESXi 6.7 Patch 08 (build number 20497097).

Workaround:
Clearing the SEL records should temporarily work around the crashes until the records become too numerous again.

Alternatively, increasing the memory allocated to the vmw_ipmi resource pool will also work, but may cause memory contention in other areas on heavily utilized ESXi hosts:
  • echo "provMemOveride:powerpath=240,vmware_base=150,vmw_ipmi=150" >> /etc/sfcb/sfcb.cfg
  • esxcli system wbem set -e 0
  • esxcli system wbem set -e 1