What is the correct way to force an app to core dump and quit?
I just came across some code which used the kill system call to send a SIGSEGV signal to an app. The rationale behind this was that this would force the app to core dump and quit. This seems so wrong to me, is this normal practice?
5 Answers 5
SIGQUIT is the correct signal to send to a program if you wish to produce a core dump. kill is the correct command line program to send signals (it is of course poorly named, since not all signals will kill the program).
Note, you should not send random signals to the program, not all of them will produce a core dump. Many of them will be handled by the program itself, either consumed, ignored, or induce other processing. Thus sending a SIGSEGV is wrong.
Unfortunately, the application could have installed a signal handler for SIGQUIT preventing the default behaviour. In this case, you won’t get a core dump any more. Would you then — and only then — consider it legitimate to try to abuse one of the other signals by default generating a core dump (SIGSEGV, SIGILL, . )?
Yes. kill is somewhat misnamed — it can send any signal. There are many uses for kill which don’t result in the process being killed at all!
+1: kill -1 is commonly used to send SIGHUP, which traditionally causes programs written with this functionality to reload their configurations. (e.g.: sendmail, init, etc.)
If you want to make an application dump it’s core from another program, pretty much the only way to do it is via a signal. SEGV would be fine for this. Alternatively you can hook a debugger up to the program and freeze it and view it’s registers and such without killing it.
If you want to dump a core from within an application there are nicer ways to do it, like via an assert().
So, no, it’s not particularly wrong to send a SEGV to a program. You could also send things like SIGILL for illegal instruction, or a divide by zero signal. It’s all fine.
No, it’s not fine. These signals all have specific meanings and programs respond to them differently. They won’t all kill the program, nor will they all produce core dumps.
The way to do it in Unix/Linux is to call abort() which will send SIGABORT to current process. The other option is raise() where you can specify what signal you want to send to current process.
Richard Stevens (_Advanced Programming in the UNIX Environment) wrote:
The generation of core is an implementation features of most Unix. It is not part of POSIX.1.
He lists 12 signals whose default action is to terminate with a core (ANSI: SIGABRT, SIGFPE, SIGILL, SIGSEGV, POSIX: SIGQUIT, Other: SIGBUS, SIGEMT, SIGIOT, SIGSYS, SIGTRAP, SIGXCPU, SIGXFSZ), all of them are overwritable (the two signals which aren’t overwritable are SIGKILL and SIGSTOP).
I’ve never seen a way to generate a core which isn’t the use of a default signal handler.
So if your goal is to generate a core and stop, the best is to choose a signal whose default handler does the job (SIGSEGV does the job), reset the default handler for the signal if you are using it and then use kill.
Abort Core Dumped in linux for a C++ progam that works in Visual Studio
I have a C++ project that was built and runs in Visual Studio. When I try to run it in unix, it gives me Abort (Core Dumped) I am using the g++ version 3.2.2 How do i Fix this program ? It needs to run in linux.
4 Answers 4
First step is to learn how to use gdb or any of the other excellent debuggers for Linux.
That should be able to tell you exactly which source line caused the problem. Then work back from there.
Other than that, we can’t really help without seeing that source code. Psychic debugging, whilst useful, is not a highly developed field of endeavour 🙂
I second that. You can run gdb on the core (gdb binary_file core_file) to see where it crashed (definitely use the «backtrace» command to see the call stack) or you can first load the program with gdb (gdb binary_file) and then use the «run» command to start the program running. The nice thing about the the second way is that gdb has some features that only work when running the program. The pitfall is that some bugs can be sensitive to the environment and running within gdb makes them go away.
@All Thanks a lot for your responses.I really appreciate it
My program worked with g++ 4.2.3. It was aborting with g++ 3.2.2.
The code that gave me the correct output in visual studio was
foundOpen = inStr.find("("); foundClose = inStr.find(")"); string inGate; inGate = inStr.substr(++foundOpen,foundClose-foundOpen);
But using g++, I had to make a small change to the substr function.
foundOpen = inStr.find("("); foundClose = inStr.find(")"); string inGate; inGate = inStr.substr(++foundOpen,foundClose-foundOpen-1);
I am also a beginner to using linux and don’t know how to use gdb. Are there any good tutorials to learn gdb?
Modifying a variable ( ++foundOpen ) and accessing it again in the same statement is generally undefined behavior — at the very least, the order can vary. The fact that it worked at all, ever, is sheer luck. You should modify the code so you change foundOpen on its own line, before the substr ; then it should work in both compilers.
I’ll take a flying guess: your program uses ‘ getch() ‘ and you found the function in the library -lcurses or -lncurses and are using that library, but your program crashes as you said.
The trouble is, that function requires a certain amount of setup to work — unlike the similarly named but rather different function that is available on Windows.
Welcome to the real world — different platforms have different functions in the standard APIs; sometimes, two platforms have a function with the same name but different meanings.
infinite abort() in a backrace of a c++ program core dump
I have a strange problem that I can’t solve. Please help! The program is a multithreaded c++ application that runs on ARM Linux machine. Recently I began testing it for the long runs and sometimes it crashes after 1-2 days like so:
*** glibc detected ** /root/client/my_program: free(): invalid pointer: 0x002a9408 ***
When I open core dump I see that the main thread it seems has a corrupt stack: all I can see is infinite abort() calls.
GNU gdb (GDB) 7.3 . This GDB was configured as "--host=i686 --target=arm-linux". [New LWP 706] [New LWP 700] [New LWP 702] [New LWP 703] [New LWP 704] [New LWP 705] Core was generated by `/root/client/my_program'. Program terminated with signal 6, Aborted. #0 0x001c44d4 in raise () (gdb) bt #0 0x001c44d4 in raise () #1 0x001c47e0 in abort () #2 0x001c47e0 in abort () #3 0x001c47e0 in abort () #4 0x001c47e0 in abort () #5 0x001c47e0 in abort () #6 0x001c47e0 in abort () #7 0x001c47e0 in abort () #8 0x001c47e0 in abort () #9 0x001c47e0 in abort () #10 0x001c47e0 in abort () #11 0x001c47e0 in abort ()
And it goes on and on. I tried to get to the bottom of it by moving up the stack: frame 3000 or even more, but eventually core dump runs out of frames and I still can’t see why this has happened. When I examine the other threads everything seems normal there.
(gdb) info threads Id Target Id Frame 6 LWP 705 0x00132f04 in nanosleep () 5 LWP 704 0x001e7a70 in select () 4 LWP 703 0x00132f04 in nanosleep () 3 LWP 702 0x00132318 in sem_wait () 2 LWP 700 0x00132f04 in nanosleep () * 1 LWP 706 0x001c44d4 in raise () (gdb) thread 5 [Switching to thread 5 (LWP 704)] #0 0x001e7a70 in select () (gdb) bt #0 0x001e7a70 in select () #1 0x00057ad4 in CSerialPort::read (this=0xbea7d98c, string_buffer=. delimiter=. timeout_ms=1000) at CSerialPort.cpp:202 #2 0x00070de4 in CScanner::readResponse (this=0xbea7d4cc, resp_recv=. timeout=1000, delim=. ) at PidScanner.cpp:657 #3 0x00071198 in CScanner::sendExpect (this=0xbea7d4cc, cmd=. exp_str=. rcv_str=. timeout=1000) at PidScanner.cpp:604 #4 0x00071d48 in CScanner::pollPid (this=0xbea7d4cc, mode=1, pid=12, pid_str=. ) at PidScanner.cpp:525 #5 0x00072ce0 in CScanner::poll1 (this=0xbea7d4cc) #6 0x00074c78 in CScanner::Poll (this=0xbea7d4cc) #7 0x00089edc in CThread5::Thread5Poll (this=0xbea7d360) #8 0x0008c140 in CThread5::run (this=0xbea7d360) #9 0x00088698 in CThread::threadFunc (p=0xbea7d360) #10 0x0012e6a0 in start_thread () #11 0x001e90e8 in clone () #12 0x001e90e8 in clone () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(Classes and functions names are a bit wierd because I changed them -:) So, thread #1 is where the stack is corrupt, backtrace of every other (2-6) shows
Backtrace stopped: previous frame identical to this frame (corrupt stack?).
It happends because threads 2-6 are created in the thread #1. The thing is that I can’t run the program in gdb because it runs on an embedded system. I can’t use remote gdb server. The only option is examining core dumps that occur not very often. Could you please suggest something that could move me forward with this? (Maybe something else I can extract from the core dump or maybe somehow to make some hooks in the code to catch abort() call). UPDATE: Basile Starynkevitch suggested to use Valgrind, but turns out it’s ported only for ARMv7. I have ARM 926 which is ARMv5, so this won’t work for me. There are some efforts to compile valgrind for ARMv5 though: Valgrind cross compilation for ARMv5tel, valgrind on the ARM9 UPDATE 2: Couldn’t make Electric Fence work with my program. The program uses C++ and pthreads. The version of Efence I got, 2.1.13 crashed in a arbitrary place after I start a thread and try to do something more or less complicated (for example to put a value into an STL vector). I saw people mentioning some patches for Efence on the web but didn’t have time to try them. I tried this on my Linux PC, not on the ARM, and other tools like valgrind or Dmalloc don’t report any problems with the code. So, everyone using version 2.1.13 of efence be prepared to have problems with pthreads (or maybe pthread + C++ + STL, don’t know).
How to programmatically cause a core dump in C/C++
I would like to force a core dump at a specific location in my C++ application. I know I can do it by doing something like:
int * crash = NULL; *crash = 1;
BTW, that method doesn’t work in all UNIXes. HPUX, for one, allows you to read and write NULL with impunity (thankfully, this is configurable).
10 Answers 10
Raising of signal number 6 ( SIGABRT in Linux) is one way to do it (though keep in mind that SIGABRT is not required to be 6 in all POSIX implementations so you may want to use the SIGABRT value itself if this is anything other than quick’n’dirty debug code).
Calling abort() will also cause a core dump, and you can even do this without terminating your process by calling fork() followed by abort() in the child only — see this answer for details.
No, you’re right, it’s not but I tend not to worry too much about the correctness of debug code. If that escapes into the wild, the cleanliness of my code is the least of my worries 🙂
Calling abort() may be useless on some architectures with some compilers and some C libraries (like gcc and glibc or uClibc on ARM) because the abort() function is declared with a noreturn attribute and the compiler totally optimizes out all the return information, which makes the core file unusable. You can’t trace it past the call to raise() or abort() itself. So it is much better to call raise(SIGABRT) directly or kill(getpid(), SIGABRT), which is virtually the same.
Sorry, on ARM the same thing happens even with raise(SIGABRT). So the only way to produce a traceable core file is kill(getpid(), SIGABRT)
A few years ago, Google released the coredumper library.
Overview
The coredumper library can be compiled into applications to create core dumps of the running program — without terminating. It supports both single- and multi-threaded core dumps, even if the kernel does not natively support multi-threaded core files.
Coredumper is distributed under the terms of the BSD License.
Example
This is by no means a complete example; it simply gives you a feel for what the coredumper API looks like.
#include . WriteCoreDump('core.myprogram'); /* Keep going, we generated a core file, * but we didn't crash. */
It’s not what you were asking for, but maybe it’s even better 🙂