The issue is caused by AMD erratum 1474; please refer Revision Guide for AMD Family 17h Models 30h-3Fh Processors . If the CC6 (core C6) power saving state is enabled on an affected CPU, a core may fail to exit CC6 after about 1044 after the last system hardware reset. Note that a reboot using VMware QuickBoot is not a system hardware reset.
From ESXi 8.0 Update 3e and above, ESXi will automatically disable CC6 after 1000 days of host uptime.
To workaround the issue, please follow anyone of the following:
Disable CC6 without a system hardware reset by running the attached Python script in the ESXi shell. Disabling CC6 is not persistent, so the user will need to run the script again after each full reset.
Alternatively, the machine may provide a way to persistently disable CC6 as a BIOS setup option. Details on how to do this depend on the hardware vendor and cannot be provided in this article.