ESXi system may crash or hang after 1044 days uptime
search cancel

ESXi system may crash or hang after 1044 days uptime

book

Article ID: 313165

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
A system using an EPYC 7002/7Fx2/7Hx2 Series CPU (codenamed Rome) or EPYC 7001 Series CPU (codenamed Naples) may crash or hang after approximately 1044 days of continuous uptime.

Cause

The issue is caused by AMD erratum 1474; please refer Revision Guide for AMD Family 17h Models 30h-3Fh Processors .  If the CC6 (core C6) power saving state is enabled on an affected CPU, a core may fail to exit CC6 after about 1044 after the last system hardware reset.  Note that a reboot using VMware QuickBoot is not a system hardware reset.

Resolution

From ESXi 8.0 Update 3e and above, ESXi will automatically disable CC6 after 1000 days of host uptime.



Workaround:

To workaround the issue, please follow anyone of the following:

  1. Perform a system hardware reset at least once every 1044 days.
  2. Disable CC6.  This may result in increased power usage.

Disable CC6 without a system hardware reset by running the attached Python script in the ESXi shell. Disabling CC6 is not persistent, so the user will need to run the script again after each full reset.

Alternatively, the machine may provide a way to persistently disable CC6 as a BIOS setup option.  Details on how to do this depend on the hardware vendor and cannot be provided in this article.

Attachments

disable_cc6_v2 get_app