By default, Pivotal HD (PHD) service Daemons use various log4j file appenders. Some do not allow the user to control how much data is generated by Hadoop Daemon logs. The sysadmin must manage and maintain the generated log data.
This article explains how to configure log4j.properties for all PHD core components to help sysadmins control and understand the PHD Daemon log management.
Refer to the following Java Docs for log4j and sample configuration parameters:
DailyRollingFileAppender_DRFA |
# # Daily Rolling File Appender # Rollover at midnight log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.DRFA.DatePattern=.yyyy-MM-dd log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n |
RollingFileAppender_RFA |
# # Rolling File Appender - cap space usage at 256mb. # hadoop.log.maxfilesize=256MB hadoop.log.maxbackupindex=20 log4j.appender.RFA=org.apache.log4j.RollingFileAppender log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize} log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex} log4j.appender.RFA.layout=org.apache.log4j.PatternLayout log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n |
FileAppender |
# # File Appender # log4j.appender.FA=org.apache.log4j.FileAppender log4j.appender.FA.File=${hive.log.dir}/${hive.log.file} log4j.appender.FA.layout=org.apache.log4j.PatternLayout log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n |
ConsoleAppender |
# # console appender options # log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n log4j.appender.console.encoding=UTF-8 |
Refer below to the Root logger environmental variables for more information:
Variable | Services | Where to Override |
HADOOP_ROOT_LOGGER=INFO,RFA | Namenode Journalnode ZKFC Datanode condary Namenode | /etc/gphd/hadoop/conf/hadoop-env.sh |
HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA | MapReduce History Server | /etc/gphd/hadoop/conf/mapred-env.sh |
YARN_ROOT_LOGGER=INFO,RFA |
Resourcemanager | /etc/gphd/hadoop/conf/yarn-env.sh |
ZOO_LOG4J_PROP=INFO,ROLLINGFILE | zookeeper | /etc/gphd/zookeeper/conf/java.env |
HBASE_ROOT_LOGGER=INFO,RFA |
hbase master | /etc/gphd/hbase/conf/hbase-env.sh |
All services will have the $<SERVICE>_LOG_DIR variable defined in /etc/default/<service>. For example, the DataNode service has "HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs". So all logs will be found in "/var/log/gphd/hadoop-hdfs" for the DataNode service.
[gpadmin@hdw1 ~]$ cat /etc/default/hadoop-hdfs-datanode | egrep ^export export HADOOP_PID_DIR=/var/run/gphd/hadoop-hdfs export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs export HADOOP_NAMENODE_USER=hdfs export HADOOP_SECONDARYNAMENODE_USER=hdfs export HADOOP_DATANODE_USER=hdfs export HADOOP_IDENT_STRING=hdfs
These Daemons source their log4j settings from the following location: /etc/gphd/hadoop/conf/log4j.properties.
The HADOOP_ROOT_LOGGER environmental variable is used to control the default logger and is sourced in the file "/usr/lib/gphd/hadoop/sbin/hadoop-daemon.sh". This sets the root logger to RollingFileAppender by default. This can be overridden here: /etc/gphd/hadoop/conf/hadoop-env.sh.
export HADOOP_ROOT_LOGGER=INFO,RFA
Audit logging uses the DRFAS as per the "hadoop.security.logger" setting configured by /etc/gphd/hadoop/conf/hadoop-env.sh with HADOOP_NAMENODE_OPTS, HADOOP_DATANODE_OPTS, and HADOOP_SECONDARYNAMENODE_OPTS environmental variables.
Mapreduce History service sources log4j settings from /etc/gphd/hadoop/conf/log4j.properties.
HADOOP_MAPRED_ROOT_LOGGER environmental variable is used to control the default logger and it is sourced in the file "/usr/lib/gphd/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh" which sets the Mapreduce History server logger to RollingFileAppender by default. This can be overridden here /etc/gphd/hadoop/conf/mapred-env.sh.
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
These Daemons source their log4j settings from /etc/gphd/hadoop/conf/log4j.properties.
YARN_ROOT_LOGGER environmental variable is used to control the default logger and it is sourced in the file "/usr/lib/gphd/hadoop-yarn/sbin/yarn-daemon.sh", which sets the default logger to RollingFileAppender. This can be overridden in /etc/gphd/hadoop/conf/yarn-env.sh.
export YARN_ROOT_LOGGER=INFO,RFA
Zookeeper sources log4j settings from /etc/gphd/zookeeper/conf/log4j.properties.
The ZOO_LOG4J_PROP environmental variable is used to control the default logger and it is sourced in the file "/usr/bin/zookeeper-server" which sets the default logger to RollingFileAppender. This can be overridden by exporting this value in the following file location /etc/gphd/zookeeper/conf/java.env.
export ZOO_LOG4J_PROP=INFO,ROLLINGFILE
These Daemons source their log4j settings from /etc/gphd/hbase/conf/log4j.properties.
The HBASE_ROOT_LOGGER environmental variable is used to control the default logger and it is sourced in the file "/usr/lib/gphd/hbase/bin/hbase-daemon.sh", which sets the default logger to RollingFileAppender. This can be overridden in /etc/gphd/hbase/conf/hbase-env.sh.
export HBASE_ROOT_LOGGER=INFO,RFA
Hive sources log4j settings from /etc/gphd/hive/conf/hive-log4j.properties. In PHD all Hive daemon logs will source this file for hive.root.logger.
hive.root.logger=WARN,DRFA hive.log.dir=/tmp/${user.name} hive.log.file=hive.log
The following file location "/etc/init.d/hive-server" is used to start the Hive server and will set the Hive server log to file name "hive-server.log". It uses the default hive.root.logger defined in the hive-log4j.properties file. This log file will get truncated each time the Hive server daemon restarts.
NAME="hive-server" LOG_FILE="/var/log/gphd/hive/${NAME}.log"
The following file location "/etc/init.d/hive-metastore" is used to start the Hive metastore and will set the Hive server log to the file name "hive-metastore". It uses the default hive.root.logger defined in hive-log4j.properties. This log file gets truncated each time the Hive metastore Daemon restarts.
NAME="hive-metastore" LOG_FILE="/var/log/gphd/hive/${NAME}.log"Both the hive-server and the hive-metastore daemon will log their data to "hive.log" as defined in hive-log4j.properties. The consolidated hive.log will get rotated as per hive.root.logger which is set to DRFA and defined in hive-log4j.properties.
The history log file location is governed by "hive.querylog.location" from the hive-stie.xml. By default, this parameter is set to "/<hdfs-site.xml hadoop.tmp.dir>/${user.name}/".