Smarts: How do I set MALLOC_CHECK_ environment variable for heap memory debugging of Smarts environment?
search cancel

Smarts: How do I set MALLOC_CHECK_ environment variable for heap memory debugging of Smarts environment?

book

Article ID: 306622

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

Symptoms:


This article explains how to set MALLOC_CHECK_ environment variable for heap memory debugging of Smarts environment.
 

Errors similar to the following are found in the stacktrace from any logs or core files for Smarts environment:

Thread 208 (Thread 1157204288 (LWP 2576)):
#0  0x0000003f7500d9eb in read () from /lib64/libpthread.so.0
#1  0x00002b8e5788a5e5 in _lineread (fd=4, buf=0x44f92ac0 "", bufsz=<value optimized out>) at /work/blackcurrent/DMT-9.0.2.X/7/smarts/platform/misc/posix/POSIX_stackdump.c:699
#2  0x00002b8e5788a79b in sm_POSIXstacktrace (fd=-1, stdinfo=<value optimized out>, line=453, file=0x2b8e57915070 "/work/blackcurrent/DMT-9.0.2.X/7/smarts/platform/misc/posix/POSIX_stackdump.c") at /work/blackcurrent/DMT-9.0.2.X/7/smarts/platform/misc/posix/POSIX_stackdump.c:531
#3  0x00002b8e5788b1ad in sm_LINUXstacktrace (fd=-1, stdinfo=0) at /work/blackcurrent/DMT-9.0.2.X/7/smarts/platform/misc/posix/POSIX_stackdump.c:453
#4  0x00002b8e57886a68 in fatalHandler (sig=11, info=0x44f96230, context=<value optimized out>) at /work/blackcurrent/DMT-9.0.2.X/7/smarts/platform/thread/posix/sthread.c:926
#5  0x00002aaabc36345a in call_chained_handler () from /opt/InCharge8/SAM/smarts/jre/lib/amd64/server/libjvm.so
#6  0x00002aaabc3603fb in os::Linux::chained_handler () from /opt/InCharge8/SAM/smarts/jre/lib/amd64/server/libjvm.so
#7  0x00002aaabc363f40 in JVM_handle_linux_signal () from /opt/InCharge8/SAM/smarts/jre/lib/amd64/server/libjvm.so
#8  0x00002aaabc36030e in signalHandler () from /opt/InCharge8/SAM/smarts/jre/lib/amd64/server/libjvm.so
#9  <signal handler called>
#10 malloc_consolidate (av=0x2b8e578245c0) at malloc.c:4874
#11 0x00002b8e5761e47b in _int_malloc (av=0x2aaade075080, bytes=<value optimized out>) at malloc.c:4290
#12 0x00002b8e57621e1e in malloc (bytes=776) at malloc.c:3655
#13 0x0000003f75cbd25d in operator new () from /usr/lib64/libstdc++.so.6
#14 0x0000003f75cbd379 in operator new[] () from /usr/lib64/libstdc++.so.6
#15 0x00002aaaac80cd05 in CI_Sequence_U<MR_MonitoringSystem::MR_ValueChange>::resize (this=0x44f967b0, nSize=16) at /work/blackcurrent/DMT-9.0.2.X/7/smarts/install/linux_rhAS50-x86-64/optimize/include/clsapi/ci_sequence_t.h:90
#16 0x00002aaaac808713 in MR_MonitoringLog::commit_end (this=0x1b75f180) at /work/blackcurrent/DMT-9.0.2.X/7/smarts/install/linux_rhAS50-x86-64/optimize/include/clsapi/ci_sequence.h:77
#17 0x00002aaaaca55785 in MR_LogManager::commit_end_st () at /work/blackcurrent/DMT-9.0.2.X/7/smarts/repos/mr/log.c:311



Environment

VMware Smart Assurance - SMARTS
VMware Smart Assurance - NCM

Resolution

On the server where Smarts SAM/IP or other Smarts domain or broker resides, edit the runcmd_env.sh and add or edit the MALLOC_CHECK_= value. This enables the diagnostic memory heap corruption logging feature in glibc.  This can be run by using the following steps:

IMPORTANT! Setting the MALLOC_CHECK_ variable causes system performance to be reduced by as much as 25% when running.  You need to unset the MALLOC_CHECK_ variable to 0 when finished with data collection or performance issues may be caused by this setting.

  1. Run the following command from the command line:

    <Basedir>/smarts/bin/sudo ./sm_edit <Basedir>/local/conf/runcmd_env.sh
     
  2. Add or edit the following line in the runcmd_env.sh file, specifying the value corresponding to the level of logging needed. The following example sets MALLOC_CHECK value to 3 (see following section):

    export MALLOC_CHECK_=3 
     
  3. Save your changes and restart the SAM/IP/SMARTS domain or broker.

MALLOC_CHECK_ variable values
The MALLOC_CHECK_ variable can have the following values and logging levels:

  • 0: Any detected heap corruption is silently ignored and an error message is not generated.  This gives a higher tolerance for errors caused by memory heap corruption.
  • 1: The  error  message is printed on stderr, but the program is not aborted.
  • 2: abort() is called immediately, but the error message is not generated.
  • 3: Will combine the attributes of setting 1 and 2 output and actions.  The error message is printed on stderr and program is aborted.  This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down.