Identifying virtual machine Monitor Panics caused by faulty CPUs
search cancel

Identifying virtual machine Monitor Panics caused by faulty CPUs

book

Article ID: 339463

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • At random intervals, a virtual machine experiences a Monitor Panic.
  • The entries in the VMkernel log are similar to:

    May 21 12:02:40 usmarccesx001 vmkernel: 25:04:22:20.312 cpu9:1232)Backtrace for current CPU #9, worldID=1232, ebp=0x3743f88
    Jun 11 17:28:57 usmarccesx001 vmkernel: 0:00:20:12.333 cpu8:1071)Backtrace for current CPU #8, worldID=1071, ebp=0x34bff88


  • In ESXi 5.0, the entries in vmkernel.log log file are similar to:

    2012-02-29T21:32:19.374Z cpu57:641308)UserDump: 1625: Dumping cartel 649488 (from world 641308) to file
/vmfs/volumes/4de9348d-dfb113a5-a1f9-a4badb22da3f/vi-facstaf1-154/vmx-zdump.000
2012-02-29T21:33:24.699Z cpu59:641308)UserDump: 1737: Userworld coredump complete.
  • In ESXi 5.0, the entries in vobd.log are similar to:

    012-02-29T21:33:24.699Z: [UserWorldCorrelator] 764479573493us: [vob.uw.core.dumped] /bin/vmx (641308) /vmfs/volumes/4de9348d-dfb####-####-########a3f/vi-facstaf1-154/vmx-zdump.000
    2012-02-29T21:33:24.700Z: [UserWorldCorrelator] 764483574372us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (2 time(s) so far). A core file may have been created at /vmfs/volumes/4de9348d-dfb1####-####-########a3f/vi-facstaf1-154/vmx-zdump.000

Environment

VMware ESX Server 3.0.x
VMware ESX Server 3.5.x
VMware ESXi 4.1.x Embedded
VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded
VMware ESX 4.1.x
VMware ESXi 4.0.x Installable
VMware vSphere ESXi 5.0
VMware ESXi 4.1.x Installable
VMware ESXi 3.5.x Embedded
VMware vSphere ESXi 5.1
VMware ESXi 3.5.x Installable

Resolution

This issue is likely caused by a faulty CPU device. In the example VMkernel log entries, the monitor faults are occurring on cpu8 and cpu9.
 
To resolve this issue, you must determine the faulty CPU and replace it.
 
For example, if the host is a 4 X dual core Intel CPUs and hyperthreading is enabled, the CPU number system is as follows:
CPU 1, core 1, hyperthread 1 = cpu0
CPU 1, core 1, hyperthread 2 = cpu1
CPU 1, core 2, hyperthread 1 = cpu2
CPU 1, core 2, hyperthread 2 = cpu3
CPU 2, core 1, hyperthread 1 = cpu4
CPU 2, core 1, hyperthread 2 = cpu5
CPU 2, core 2, hyperthread 1 = cpu6
CPU 2, core 2, hyperthread 2 = cpu7
CPU 3, core 1, hyperthread 1 = cpu8
CPU 3, core 1, hyperthread 2 = cpu9
CPU 3, core 2, hyperthread 1 = cpu10
CPU 3, core 2, hyperthread 2 = cpu11
CPU 4, core 1, hyperthread 1 = cpu12
CPU 4, core 1, hyperthread 2 = cpu13
CPU 4, core 2, hyperthread 1 = cpu14
CPU 4, core 2, hyperthread 2 = cpu15
In this example, cpu8 and cpu9 are the first core CPU3.

Replace CPU3 to resolve the issue.

You may see similar user visible error on ESXi 5.0. This error also get written in the vmware.log of the virtual machine.

*** VMware ESX internal monitor error ***

You can report this problem by selecting menu item Help -> VMware on the Web -> Request Support, or by going to Location of the core file for the virtual machine /vmfs/volumes/datastorename/vmname/vmmcores.gz.

Provide the log file (vmware.log) and the core file(s) (/vmfs/volumes/datastorename/vmname/vmmcores.gz, /vmfs/volumes/datastorename/vmname/vmx-zdump.000).

If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure Virtual Machine Web page. Then reproduce the incident and file it according to the instructions.