process Tracing tools in linux
In Linux, I want to trace the function calls made by the new process before it actually starts up. Which tool would help me with this? Ex: A call would be made to ‘do_fork’ for the creation of new process. Would like to know the flow of such calls for a new process. Correct me if am wrong in understanding of the starting of process. Thanks in advance
Can you explain exactly (relative to the strace(1) utility) what you need? I.e. what is that strace doesn’t give you?
I tried out with the ‘perf sched’ tool.. But I couldn’t get the flow of function calls.. May be am missing out some options.. Will go through man page of strace..
@hroptatyr Am not finding the do_fork or any such calls.. Can you please help me with the options to be used..
2 Answers 2
As I suggested in the comments, strace the parent shell, for instance like so:
strace -fF -v -p PIDOF_PARENT_SHELL
You need to get the shell’s pid first (e.g. through echo $$ ), the -fF makes sure that strace follows forks and vforks.
Here’s a little example trace (zsh starting a program called test):
24077 15:03:07.094522 alarm(0) = 0 24077 15:03:07.097920 ioctl(10, SNDCTL_TMR_STOP or TCSETSW, ) = 0 24077 15:03:07.098450 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 24077 15:03:07.098830 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 24077 15:03:07.099033 pipe([3, 4]) = 0 24077 15:03:07.099559 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f663a44c9d0) = 32626 24077 15:03:07.103197 close(4) = 0 24077 15:03:07.103413 read(3, 32626 15:03:07.103700 close(3) = 0 32626 15:03:07.105151 setpgid(0, 32626) = 0 32626 15:03:07.106440 ioctl(10, TIOCSPGRP, [32626]) = 0 32626 15:03:07.106717 close(10) = 0 32626 15:03:07.106933 rt_sigaction(SIGTTOU, , , 8) = 0 32626 15:03:07.111713 rt_sigaction(SIGTTIN, , , 8) = 0 32626 15:03:07.112005 rt_sigaction(SIGTSTP, , , 8) = 0 32626 15:03:07.112253 rt_sigaction(SIGTERM, , , 8) = 0 32626 15:03:07.112493 rt_sigaction(SIGINT, , , 8) = 0 32626 15:03:07.112732 rt_sigaction(SIGQUIT, , , 8) = 0 32626 15:03:07.113558 getrusage(RUSAGE_CHILDREN, , ru_stime=, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=0, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=0, ru_nivcsw=0>) = 0 32626 15:03:07.114989 close(4 24077 15:03:07.115109 "", 1) = 0 24077 15:03:07.116448 close(3) = 0 24077 15:03:07.116804 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 24077 15:03:07.117025 rt_sigsuspend([] 32626 15:03:07.117147 ) = 0 32626 15:03:07.117471 rt_sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD], 8) = 0 32626 15:03:07.118987 execve("/home/freundt/temp/test", ["test"], [. ]]) = 0 32626 15:03:07.264156 brk(0) = 0x60a000 .
And you can clearly see the call to clone() and execve() along with other interesting stuff.
Profiling and Tracing Processes in Linux
This article shows how a Linux process can be traced and profiled. Using the “last” command as an example, profiling is used to explain why a process was very slow, and why another, very similar, process (dump-utmp) was much faster. “Tracing” here means seeing what a process is doing at any moment. “Profiling” means showing (afterwards) how long it spent doing different things.
“Last” is a Linux command that reads and summarizes the utmp file, where login records are stored. I had a “last” command taking hours to complete because the utmp file had grown large (1.9 Gb). Used strace to see what it was doing.
[root@pluto log]# strace -p 14974 Process 14974 attached - interrupt to quit _llseek(3, 1975549952, [1975549952], SEEK_SET) = 0 read(3, "". 16384) = 16384 _llseek(3, 1975533568, [1975533568], SEEK_SET) = 0 read(3, "". 16384) = 16384 write(1, "00:00) nskapasi vsftpd:8313 ". 4096) = 4096 _llseek(3, 1975517184, [1975517184], SEEK_SET) = 0
Last processes the wtmp file in reverse, which is why it is calling llseek() all the time to move backwards in the file (by 16 k chunks). However this does not explain the slowness. The “last” process was maxing out one CPU, ie. it was CPU bound. Why so ?
Using strace with the -c flag to profile it:
# strace -c -p 14974
Process 14974 attached - interrupt to quit Process 14974 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 46.85 0.000699 19 36 read 45.17 0.000674 42 16 write 7.10 0.000106 3 36 _llseek 0.87 0.000013 7 2 brk ------ ----------- ----------- --------- --------- ---------------- 100.00 0.001492 90 total
The process is spending only about 7% of its time in llseek. Whatever is thrashing the CPU, it isn’t llseek.
There is another profiling command called ltrace. It is like strace but traces library calls rather than system calls. Used it thus. NB the PID changed because I restarted “last“.
# ltrace -p 15459 strncmp("vsftpd:24274", "vsftpd:10592", 32) = 1 strncmp("vsftpd:24281", "vsftpd:10592", 32) = 1 strncmp("vsftpd:24288", "vsftpd:10592", 32) = 1 strncmp("vsftpd:24295", "vsftpd:10592", 32) = 1 strncmp("vsftpd:24302", "vsftpd:10592", 32) = 1 strncmp("vsftpd:24309", "vsftpd:10592", 32) = 1
…and so on. The “last” process sure is making a lot of calls to strncmp().
ltrace offers profiling too, again with addition of a -c flag:
# ltrace -c -p 15459 % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 99.92 5.655910 81 69222 strncmp 0.06 0.003652 23 154 _IO_putc 0.01 0.000288 24 12 memcpy 0.00 0.000163 32 5 ctime 0.00 0.000126 25 5 free 0.00 0.000099 24 4 sprintf 0.00 0.000098 24 4 malloc 0.00 0.000096 24 4 strncat 0.00 0.000052 26 2 snprintf 0.00 0.000050 25 2 strcpy 0.00 0.000026 26 1 printf ------ ----------- ----------- --------- -------------------- 100.00 5.660560 69415 total
The “last” process is spending nearly all of its time (99.92%) in strncmp(), a compute-intensive C library function for comparing 2 strings. Which explains why the last process was hogging a whole CPU and executing so slowly.
The wmtp file had grown so large because the system (an FTP server) was receiving up to 300 FTP connections per minute and logging them all.
Regarding wtmp, there exists another command called dump-utmp, which can process the whole 1.9 Gb utmp file in 2.5 minutes (“last” would take at least 3 hours, possibly many more, if left to complete. The full command was:
# dump-utmp ./wtmp > test.out
Using strace and ltrace on dump-utmp shows that the command processes the wtmp file forwards and so does not need to call lseek() all the time like “last“. Also it just seems to read and write the data, not bothering with comparing strings or other processing, and produces a much more “raw” report than the output of “last“.
So Linux contains some pretty cool tracing tools. The above was on Red Hat. A quick shufty indicates ltrace and strace are readily available in Fedora 16 and Debian Squeeze, for example.
Footnote
Truss offers strace-like functionality in Solaris. Use truss -p
for tracing system calls, and truss -c -p
for profiling. The options for tracing library calls are a little more complicated – see this example.
One thought on “ Profiling and Tracing Processes in Linux ”
How do I trace a system call in Linux?
How would I follow a system call from a trap to the kernel, to how arguments are passed, to how the system call in located in the kernel, to the actual processing of the system call in the kernel, to the return back to the user and how state is restored?
3 Answers 3
This is the most powerful method I’ve found so far. It can even show the call arguments: Does ftrace allow capture of system call arguments to the Linux kernel, or only function names?
sudo apt-get install systemtap sudo stap -e 'probe syscall.mkdir < printf("%s[%d] ->%s(%s)\n", execname(), pid(), name, argstr) >'
sudo rm -rf /tmp/a /tmp/b mkdir /tmp/a mkdir /tmp/b
mkdir[4590] -> mkdir("/tmp/a", 0777) mkdir[4593] -> mkdir("/tmp/b", 0777)
Tested on Ubuntu 18.04, Linux kernel 4.15.
ltrace -S shows both system calls and library calls
This awesome tool therefore gives even further visibility into what executables are doing.
ftrace minimal runnable example
Mentioned at https://stackoverflow.com/a/29840482/895245 but here goes a minimal runnable example.
#!/bin/sh set -eux d=debug/tracing mkdir -p debug if ! mountpoint -q debug; then mount -t debugfs nodev debug fi # Stop tracing. echo 0 > "$/tracing_on" # Clear previous traces. echo > "$/trace" # Find the tracer name. cat "$/available_tracers" # Disable tracing functions, show only system call events. echo nop > "$/current_tracer" # Find the event name with. grep mkdir "$/available_events" # Enable tracing mkdir. # Both statements below seem to do the exact same thing, # just with different interfaces. # https://www.kernel.org/doc/html/v4.18/trace/events.html echo sys_enter_mkdir > "$/set_event" # echo 1 > "$/events/syscalls/sys_enter_mkdir/enable" # Start tracing. echo 1 > "$/tracing_on" # Generate two mkdir calls by two different processes. rm -rf /tmp/a /tmp/b mkdir /tmp/a mkdir /tmp/b # View the trace. cat "$/trace" # Stop tracing. echo 0 > "$/tracing_on" umount debug
# tracer: nop # # _-----=> irqs-offhttps://sourceware.org/systemtap/documentation.html # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | mkdir-5619 [005] . 10249.262531: sys_mkdir(pathname: 7fff93cbfcb0, mode: 1ff) mkdir-5620 [003] . 10249.264613: sys_mkdir(pathname: 7ffcdc91ecb0, mode: 1ff)
One cool thing about this method is that it shows the function call for all processes on the system at once, although you can also filter PIDs of interest with set_ftrace_pid .
Tested on Ubuntu 18.04, Linux kernel 4.15.
GDB step debug the Linux kernel
Depending on the level of internals detail you need, this is an option: How to debug the Linux kernel with GDB and QEMU?
strace minimal runnable example
Here is a minimal runnable example of strace : How should strace be used? with a freestanding hello world, which makes how everything works perfectly clear.
perf top -F 49 -e raw_syscalls:sys_enter --sort comm,dso --show-nr-samples
and the BPF-based traceloop: https://github.com/kinvolk/traceloop which the article claims to be a very fast method:
sudo -E ./traceloop cgroups --dump-on-exit /sys/fs/cgroup/system.slice/sshd.service