VMkernel reports corrected memory errors and retirement of memory pages
search cancel

VMkernel reports corrected memory errors and retirement of memory pages

book

Article ID: 398205

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • VMkernel logs multiple events related to memory read errors as report by machine check architecture on the ESXi host.

/var/run/log/vmkernel.log

cpu26:2104586)MCA: 209: CE Intr G0 B11 S8c00004200800090 Aaca6f602c0 M8000000000000086 Paca6f602c0/40 Memory Controller Read Error on Channel 0.
cpu48:2104583)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8b2e7e7940 M8000000000000086 P8b2e7e7940/40 Memory Controller Read Error on Channel 0.
cpu48:2104583)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8b2e7e7940 M8000000000000086 P8b2e7e7940/40 Memory Controller Read Error on Channel 0.
cpu15:308154417)MCA: 209: CE Poll G0 B11 S8c00014200800090 Aa9738e7240 M8000000000000086 Pa9738e7240/40 Memory Controller Read Error on Channel 0.
cpu15:308154417)MCA: 209: CE Poll G0 B11 S8c00014200800090 Aa9738e7240 M8000000000000086 Pa9738e7240/40 Memory Controller Read Error on Channel 0.
cpu9:2104581)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8ace525d40 M8000000000000086 P8ace525d40/40 Memory Controller Read Error on Channel 0.
cpu9:2104581)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8ace525d40 M8000000000000086 P8ace525d40/40 Memory Controller Read Error on Channel 0.
cpu38:2104588)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8f22aa89c0 M8000000000000086 P8f22aa89c0/40 Memory Controller Read Error on Channel 0.
cpu38:2104588)MCA: 209: CE Intr G0 B11 S8c00004200800090 A8f22aa89c0 M8000000000000086 P8f22aa89c0/40 Memory Controller Read Error on Channel 0.

  • VMkernel and VMkwarning logs on ESXi host suggest multiple memory pages are selected for retirement

/var/run/log/vmkwarning.log

cpu21:2097272)WARNING: PageRetire: 624: Number of kernel MPNs selected for retirement is 256
cpu40:2097272)WARNING: PageRetire: 624: Number of kernel MPNs selected for retirement is 512
cpu4:2097272)WARNING: PageRetire: 628: Number of user shared MPNs selected for retirement is 8
cpu40:2097272)WARNING: PageRetire: 624: Number of kernel MPNs selected for retirement is 512
cpu49:2097272)WARNING: PageRetire: 624: Number of kernel MPNs selected for retirement is 1024
cpu49:2097272)WARNING: PageRetire: 624: Number of kernel MPNs selected for retirement is 1024

Environment

vSphere ESXi

Cause

  • Corrected errors (CE) suggests multiple memory read failures indicating faulty memory module/controller. 
  • Page retire events indicate the hardware is instructing the VMkernel not to use certain memory regions as they have seen consistent failures and not safe to use 

Resolution

Engage hardware vendor for further diagnostics and possible replacement of faulty components.

Additional Information

  • This issue may also lead to CPU lockups and possible host not responding state.
  • In some cases when hardware fails to correct the error, VMs may fail or server may encounter a PSOD.