Identifying virtual machine Monitor Panics caused by faulty CPUs

Products

VMware vSphere ESXi

Issue/Introduction

At random intervals, a virtual machine experiences a Monitor Panic.
The entries in the VMkernel log are similar to:

May 21 12:02:40 usmarccesx001 vmkernel: 25:04:22:20.312 cpu9:1232)Backtrace for current CPU #9, worldID=1232, ebp=0x3743f88
Jun 11 17:28:57 usmarccesx001 vmkernel: 0:00:20:12.333 cpu8:1071)Backtrace for current CPU #8, worldID=1071, ebp=0x34bff88
In ESXi 5.0, the entries in vmkernel.log log file are similar to:

2012-02-29T21:32:19.374Z cpu57:641308)UserDump: 1625: Dumping cartel 649488 (from world 641308) to file

/vmfs/volumes/4de9348d-dfb113a5-a1f9-a4badb22da3f/vi-facstaf1-154/vmx-zdump.000
2012-02-29T21:33:24.699Z cpu59:641308)UserDump: 1737: Userworld coredump complete.

In ESXi 5.0, the entries in vobd.log are similar to:

012-02-29T21:33:24.699Z: [UserWorldCorrelator] 764479573493us: [vob.uw.core.dumped] /bin/vmx (641308) /vmfs/volumes/4de9348d-dfb####-####-########a3f/vi-facstaf1-154/vmx-zdump.000
2012-02-29T21:33:24.700Z: [UserWorldCorrelator] 764483574372us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (2 time(s) so far). A core file may have been created at /vmfs/volumes/4de9348d-dfb1####-####-########a3f/vi-facstaf1-154/vmx-zdump.000

Environment

VMware ESX Server 3.0.x
VMware ESX Server 3.5.x
VMware ESXi 4.1.x Embedded
VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded
VMware ESX 4.1.x
VMware ESXi 4.0.x Installable
VMware vSphere ESXi 5.0
VMware ESXi 4.1.x Installable
VMware ESXi 3.5.x Embedded
VMware vSphere ESXi 5.1
VMware ESXi 3.5.x Installable

Resolution

This issue is likely caused by a faulty CPU device. In the example VMkernel log entries, the monitor faults are occurring on cpu8 and cpu9.

To resolve this issue, you must determine the faulty CPU and replace it.

For example, if the host is a 4 X dual core Intel CPUs and hyperthreading is enabled, the CPU number system is as follows:

CPU 1, core 1, hyperthread 1 = cpu0
CPU 1, core 1, hyperthread 2 = cpu1
CPU 1, core 2, hyperthread 1 = cpu2
CPU 1, core 2, hyperthread 2 = cpu3
CPU 2, core 1, hyperthread 1 = cpu4
CPU 2, core 1, hyperthread 2 = cpu5
CPU 2, core 2, hyperthread 1 = cpu6

CPU 2, core 2, hyperthread 2 = cpu7

CPU 3, core 1, hyperthread 1 = cpu8

CPU 3, core 1, hyperthread 2 = cpu9

CPU 3, core 2, hyperthread 1 = cpu10

CPU 3, core 2, hyperthread 2 = cpu11

CPU 4, core 1, hyperthread 1 = cpu12

CPU 4, core 1, hyperthread 2 = cpu13

CPU 4, core 2, hyperthread 1 = cpu14

CPU 4, core 2, hyperthread 2 = cpu15

In this example, cpu8 and cpu9 are the first core CPU3.

Replace CPU3 to resolve the issue.

You may see similar user visible error on ESXi 5.0. This error also get written in the vmware.log of the virtual machine.

*** VMware ESX internal monitor error ***

You can report this problem by selecting menu item Help -> VMware on the Web -> Request Support, or by going to Location of the core file for the virtual machine /vmfs/volumes/datastorename/vmname/vmmcores.gz.

Provide the log file (vmware.log) and the core file(s) (/vmfs/volumes/datastorename/vmname/vmmcores.gz, /vmfs/volumes/datastorename/vmname/vmx-zdump.000).

If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure Virtual Machine Web page. Then reproduce the incident and file it according to the instructions.