This article explains how to trace and trigger a core dump of a running windows Diego Cell application. We will not go into details regarding how to review the dump once collected. Triggering a core dump is particularly useful when an app that is running for a period of time all of a sudden crashes with errors like "Access Violation" and you want to determine why the fault occurred.
What will you Need?
What are the symptoms
In this example, we will describe some symptoms that you may see that cannot be root caused without a core dump. Let's assume you have a .NET application running in a windows Diego container and the app all of the sudden crashes with this error
2017-02-02T15:51:14.64-0800 [API/0] OUT App instance exited with guid cf9f685d-2562-4cb4-a9b0-834451b88c13 payload: {"instance"=>"", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"2 error(s) occurred:\n\n* Exited with status -1073741819\n* cancelled", "crash_count"=>4, "crash_timestamp"=>1486079474619372130, "version"=>"276d4084-18ed-48fe-9aee-1b29e6525a8d"}
The interesting part of this error is the application exit status code "Exited with status -1073741819". Status code -1073741819 in Hex is 0xc0000005 which means "Access Violation". An Access Violation is usually some form of a memory access fault or some other IO related issue. We can get more info on the error code if we check the Windows Application Event logs
The error shows that there was an access violation in the iisfreb.dll at offset 0x67da. Using windbg we can open that dll and find the line of code at offset 0x67da
'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\windbg.exe' -z C:\Windows\System32\inetsrv\iisfreb.dll
0:000> lm -v CheckSum: 00037232 ImageSize: 0002C000 File version: 8.5.9600.16384 Product version: 8.5.9600.16384
0:000> lm start end module name 00000001`80000000 00000001`8002c000 iisfreb (pdb symbols) C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\sym\iisfreb.pdb\89CF8B470B1B48BA829E6B4C6A27A7391\iisfreb.pdb
0:000> ln 180000000+67da (00000001`8000672c) iisfreb!FREB_REQUEST_CONTEXT::FilterWWWServerAreasAndVerbosity+0xae | (00000001`80006844) iisfreb!FREB_REQUEST_CONTEXT::SerializeAllTraceEventsToLogDataString
In some cases this will be enough info to determine root cause, however, if further information is required then we can apply the DebugDiag procedure to trigger a core dump when this access violation occurs
Using DebugDiag tool to trigger a core dump
In this example we will use a small .NET app we called CpuBurner and show how to enable tracing for access violation errors on this process.
1. The first thing we should check is to make sure we are targeting the right application. We can use cf cli to get the app guid
$ cf app cpuburner --guid 91f18699-87f9-43a3-95da-e06a8844795d
2. Then we can go to windows task manager -> right click the process -> Open File Location
3. You will see windows explorer opens a path like this C:\containerizer\BCC2AB46FF4649B4FE\user\app. The directory name of BCC2AB46FF4649B4FE is the username created for this container. Garden windows will create a new user for each app container and all processes run in that container will use this user account.
4. If we open the acsii text file in location C:\containerizer\BCC2AB46FF4649B4FE\private\properties with notepad we can see the app guid "network.app_id":"91f18699-87f9-43a3-95da-e06a8844795d" matches what we get in the cli and we know we are working with the correct app
5. Using Task manager we can look up the process ID of the cpuburner app and make a note of it
6. Launch "DebugDiag 2 Collections" and use the crash wizard to start tracing the cpuburner process
7. Then select to trace on a specific process
8. Select the CpuBurner.exe that matches the process id we found in taskmaster
9. On the next prompt click Exceptions -> Add Exception and then populate the Configure Exception form to trigger the "Full Userdump" action when the Access Violation error code 0xc0000005 is encountered
10. DebugDiag will generate all core dumps and trace logs in the C:\Program Files\DebugDiag\Logs\Crash rule for all instances of CpuBurner.exe directory
11. Once the rule is activated you simply have to run steps to reproduce the fault or wait for the problem to resurface. The developer can use Microsoft tools to analyze the core dump and determine the root cause for the fault.
Some helpful links are below: