The breakpad
project is an open-source framework for crash reporting.
In Impala 2.6 and higher, Impala can use breakpad
to record stack information and
register values when any of the Impala-related daemons crash due to an error such as SIGSEGV
or unhandled exceptions.
The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little
memory, which improves reliability if the crash occurs while the system is low on memory.
By default, a minidump file is generated when an Impala-related daemon crashes.
--enable_minidumps
configuration setting
to false
. Restart the corresponding services or
daemons.
--minidump_path
configuration setting to
an empty string. Restart the corresponding services or daemons.
In Impala 2.7 and higher,
you can send a SIGUSR1
signal to any Impala-related daemon to write a
Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
without triggering a crash.
Clusters not managed by cluster management software: impala_log_dir/daemon_name/minidumps/daemon_name
To specify a different location, set the minidump_path configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.
If you specify a relative path for this setting, the value is interpreted relative to the default minidump_path directory.
Like any files used for logging or troubleshooting, consider limiting the number of minidump files, or removing unneeded ones, depending on the amount of free storage space on the hosts in the cluster.
Because the minidump files are only used for problem resolution, you can remove any such files that are not needed to debug current issues.
To control how many minidump files Impala keeps around at any one time, set the max_minidumps configuration setting for of one or more Impala-related daemon, and restart the corresponding services or daemons. The default for this setting is 9. A zero or negative value is interpreted as "unlimited".
You can see in the Impala log files when crash events occur that generate minidump files. Because each restart begins a new log file, the "crashed" message is always at or near the bottom of the log file. There might be another later message if core dumps are also enabled.
The following example uses the command kill -11 to
simulate a SIGSEGV
crash for an impalad
process on a single DataNode, then examines the relevant log files and minidump file.
First, as root on a worker node, kill the impalad process with a
SIGSEGV
error. The original process ID was 23114.
# ps ax | grep impalad
23114 ? Sl 0:18 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
31259 pts/0 S+ 0:00 grep impalad
#
# kill -11 23114
#
# ps ax | grep impalad
31374 ? Rl 0:04 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
31475 pts/0 S+ 0:00 grep impalad
We locate the log directory underneath /var/log.
There is a .INFO
, .WARNING
, and .ERROR
log file for the 23114 process ID. The minidump message is written to the
.INFO
file and the .ERROR
file, but not the
.WARNING
file. In this case, a large core file was also produced.
# cd /var/log/impalad
# ls -la | grep 23114
-rw------- 1 impala impala 3539079168 Jun 23 15:20 core.23114
-rw-r--r-- 1 impala impala 99057 Jun 23 15:20 hs_err_pid23114.log
-rw-r--r-- 1 impala impala 351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
-rw-r--r-- 1 impala impala 29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
-rw-r--r-- 1 impala impala 228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
The .INFO
log includes the location of the minidump file, followed by
a report of a core dump. With the breakpad minidump feature enabled, now we might
disable core dumps or keep fewer of them around.
# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
...
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libpthread.so.0+0xb68a] pthread_cond_wait+0xca
#
# Core dump written. Default location: /var/log/impalad/core or core.23114
#
# An error report file with more information is saved as:
# /var/log/impalad/hs_err_pid23114.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
...
# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
Log file created at: 2016/06/23 14:03:43
Running on machine:.worker_node_123
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
The resulting minidump file is much smaller than the corresponding core file, making it much easier to supply diagnostic information to the appropriate support channel.
# pwd
/var/log/impalad
# cd ../impala-minidumps/impalad
# ls
0980da2d-a905-01e1-25ff883a-04ee027a.dmp
# du -kh *
2.4M 0980da2d-a905-01e1-25ff883a-04ee027a.dmp