Reading a core file generated on UNIX or Linux

Products

CA Harvest Software Change Manager - OpenMake Meister Autosys Workload Automation CA Workload Automation Advanced Integration for Hadoop

Issue/Introduction

Is it OK to just supply this file to Support? Usually, if you see a program quitting with an abnormal fault, it is possible to find the cause behind the fault. If you have a Debugger installed on the machine where this fault is seen, this can help Support isolate the issue easily.

Environment

Harvest Software Change Manager v13.x and up

Resolution

Core dumps usually contain valuable information. This article explains some ways to extract this valuable information on the machine where the core was produced.

STEP A: How to capture the core file

Not all memory access errors (ex: segmentation fault) produce a core dump file. If you are receiving a memory access error, but a core file is not being created, there are a few system settings you can check to be sure a core file is being created and that it can be saved.

Verify ulimit:

The ulimit command allows you to view and set system/resource limitations for the user shell where a process is invoked, or the child shells of that shell. For example:
```
$ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 4
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited
ulimit -H -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 4
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited
```
The above two UNIX commands should give you decent information about the user limitations ( ulimit -a ) and the system limitations ( ulimit -H -a ). In the above example, it's pretty clear that the user limit for core file size is 0. So, even if a memory access error resulted in a core dump, you would not get a generated core file. You can change this by using ulimit -c XXXXXX where XXXXXX is the maximum value. This is almost generic to all UNIX platforms (refer to the main pages of ulimit for more details).

Once the above modification is done, the change is usually effective only to the processes generated (from that point onwards) by the shell where the change is made and its child shells. For your program (which produced the core dump) to recognize this change, you'd have to restart that program in this shell. If the change was made to the UNIX user profile, a logout/login should make the change effective as well. Then re-execute the program to produce the memory access error again.
File sizes:

It's also possible that the core dump could be created, but you exceeded your resource limit on the file system. This could happen if the size of the core file was too big and it was not permitted. This limitation can also be adjusted with ulimit (try the file size limit parameter, ulimit -f XXXXXXX). Watch out for disk and system limitations like this.

Core administration tools:

Some flavors of UNIX have specific core administration tools (ex: Solaris's coreadm ) to manage the core file creation. Verify the settings on such tools and configure it to allow the creation of the core file.

Sample coreadm output:

     global core file pattern:
     init core file pattern: core
     global core dumps: disabled
     per-process core dumps: enabled
     global setid core dumps: disabled
     per-process setid core dumps: disabled
     global core dump logging: disabled

Clean up scripts removing core files:

Since core dump files can be quite large, UNIX administrators often run nightly jobs to clean up any core files that are created each day. Check with your UNIX admin to see if such a job is run. Log files from cron tasks may help.
System auditing/logs:

System audits/logs may indicate some information about the errors as well. Refer to the system logs and see if there are any clues (refer to vendor documentation for details)

STEP B: I got hold of the core, now what?

Find the process name which created the core file. There are many ways to do this, but a simple one is to use the file command. Example:

file core.filename
core:ELF-64 core file - PA-RISC 2.0 from 'bkrd' - received SIGABRT

STEP C: Identify the paths involved:

Now that you know the process name, find the full path of the executable in question. For Harvest, the actual libraries and executables are in the $HARVESTHOME/lib directory, while the $HARVESTHOME/bin based files are actually wrapper scripts. These wrapper scripts create a run-time environment with specific references to other libraries and invoke the actual harvest executables from the $HARVESTHOME/bin directory.

So, we would have to note down the path to the actual executables. In our bkrd case above, it would be $HARVESTHOME/lib/bkrd (assuming $HARVESTHOME is a valid variable referring to the location where Harvest was installed). Let's call it the $EXEC_PATHNAME for the purposes of this document.

Next, you need to note down the full path to the core file produced. Example: /home/harvest/core.filename. Let's call it the $CORE_PATHNAME for the purposes of this document.

STEP D: Use a debugger to extract information from the core file:

Debuggers like dbx/adb/gdb can be used to extract information from core files. Usually, they work with a syntax such as this:

gdb $EXEC_PATHNAME $CORE_PATHNAME
adb $EXEC_PATHNAME $CORE_PATHNAME
dbx $EXEC_PATHNAME $CORE_PATHNAME

However, before the above command is invoked, you'd have to setup the necessary environment to access the proper libraries. An easier way is by doing something like this:

cp $HARVESTHOME/bin/bkrd /tmp/coreread
chmod u+w /tmp/coreread
open /tmp/coreread in your favorite text editor
delete the last line of the /tmp/coreread which usually reads: exec $HARVESTDIR/lib/bkrd

Add a new line at the end of the file /tmp/coreread to read something like:

gdb $EXEC_PATHNAME $CORE_PATHNAME
# where $EXEC_PATHNAME is full path to the executable that core dumped
# where $CORE_PATHNAME is full path to the core file
# instead of gdb, it could be full path to dbx or gdb or adb, depending 
# on which debugger you are using

Save the file
Exit your editor
Invoke the program: /tmp/coreread

Now you should be at your favorite debugger prompt and will be able to perform some post-mortem operations like back tracing, displaying the stack information when the core dump happened, etc., Such information can help CA Engineering to isolate the reason behind the fault in the program.

This information can be supplied to CA Tech Support for further analysis, along with additional information like the platform configuration, Harvest configuration/version/patch level, etc.,.

Other Relevant Information:

Note (1):
Refer to the specific vendor's documentation for additional information about each UNIX/LINUX platform/debugger that's being used.

Note (2):
Important commands at various debugger prompts:

At the dbx prompt:

main
where
quit
At the adb prompt:

$c
$q
At the gdb prompt:
bt
where
quit

Other options can be used as well. The above are just given as examples.

Note (3): Contact Support in case of any questions. Above operations do require some knowledge of computer programming / systems administration.