How to enable core dump files for Greenplum Database (GPDB)
search cancel

How to enable core dump files for Greenplum Database (GPDB)

book

Article ID: 297045

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

Core dumps are very useful for debugging a database panic/crash. It is recommend to enable core file generation on the master/coordinator and segment servers for Greenplum Database clusters.

This article covers how to configure the cluster to generate core dump files.

Resolution

Set "ulimit -c" for gpadmin user to be "unlimited"

  1. Check current setting with "ulimit -c" on all hosts in the cluster
    [gpadmin@mdw ~]$ gpssh -f hostfile ulimit -c
    [sdw1] 0
    [sdw2] 0
    [ mdw] 0
  2. If it is not set to "unlimited" on each host, create the file /etc/security/limits.d/coredumps.conf on each host with the contents
    # Core file size set to unlimited for user gpadmin
    gpadmin - core unlimited
  3. Verify with "ulimit -c" on all hosts in the cluster
    [gpadmin@mdw ~]$ gpssh -f hostfile ulimit -c
    [sdw1] unlimited
    [sdw2] unlimited
    [ mdw] unlimited
    Note: Log out and back in to pick up the ulimit changes for the current login session.

Set kernel parameters, Option 1.

  1. Create file /etc/sysctl.d/corefiles.conf on each host in the cluster with the contents
    kernel.core_uses_pid = 1
    kernel.core_pattern = /<directory>/core-%e-%s-%u-%g-%p-%t # Replace <directory> with an appropriate location for the core files, their size may range in GB
    where:
    kernel.core_uses_pid = 1 - Appends the coring processes PID to the core file name.
    kernel.core_pattern = /<directory>/core-%e-%s-%u-%g-%p-%t - When the application terminates abnormally, a core file should appear in the /tmp. The kernel.core_pattern sysctl controls exact location of core file. You can define the core file name with the following template whih can contain % specifiers which are substituted by the following values when a core file is created:
     %% - A single % character
     %p - PID of dumped process
     %u - real UID of dumped process
     %g - real GID of dumped process
     %s - number of signal causing dump
     %t - time of dump (seconds since 0:00h, 1 Jan 1970)
     %h - hostname (same as ’nodename’ returned by uname(2))
     %e - executable filename

  2. Change the permissions on <directory> to "1777" to ensure gpadmin can save files to the directory (In the examples below, the <directory> is "/var/crash"):
    [root@mdw ~]$ gpssh -f hostfile chmod 1777 /var/crash
  3.   Apply the changes as root user:
    [root@mdw ~]$ source /usr/local/greenplum-db/greenplum_path.sh
    [root@mdw ~]$ gpssh -f hostfile sysctl -p /etc/sysctl.d/corefiles.conf
    [ mdw] kernel.core_uses_pid = 1
    [ mdw] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t
    [sdw1] kernel.core_uses_pid = 1
    [sdw1] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t
    [sdw2] kernel.core_uses_pid = 1
    [sdw2] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t
    Verify the settings :
    root@mdw ~]$ gpssh -f hostfile sysctl kernel.core_uses_pid
    [sdw2] kernel.core_uses_pid = 1
    [ mdw] kernel.core_uses_pid = 1
    [sdw1] kernel.core_uses_pid = 1

    root@mdw ~]$ gpssh -f hostfile sysctl kernel.core_pattern
    [ mdw-lab1] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t
    [sdw2-lab1] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t
    [sdw1-lab1] kernel.core_pattern = /var/crash/core-%e-%s-%u-%g-%p-%t

Set kernel parameters, Option 2 (using systemd-coredump)

See Using systemd-coredump to debug application crashes. The documentation is for SUSE linux, but applies to other versions of Linux also.

Verify the creation of core dumps

Run a simple process like "sleep 600 &" in the background and kill the process with "kill -11 <PID>" to generate a coredump file. For example:

[gpadmin@mdw ~]$ sleep 600 &
[1] 3040

[gpadmin@mdw ~]$ kill -11 3040
[gpadmin@mdw ~]$
[1]+  Segmentation fault      (core dumped) sleep 600

[gpadmin@mdw ~]$ ls -l /var/crash/core-*
-rw------- 1 gpadmin gpadmin 385024 Oct  3 13:42 /var/crash/core-sleep-11-1000-1000-3040-1727959336

Restart Greenplum Database if "ulimit" was changed for the gpadmin user

Greenplum database needs to be restarted to ensure that the ulimit changes are effective.

Log out and back in as gpadmin to pick up the changes in the ulimits before restarting the database.