logging RAM memory high-water mark of a Linux process
I would like to log the RAM memory high-water mark of a Linux process (kernel 3.2.0-36) by the time the process finishes. I am calling the process inside a Perl script. Something like:
my $cmd = "logmemory -o mem.log mycmd options 1>cmd.out 2>cmd.err"; unless(system("$cmd") == 0)
3 Answers 3
Take a look at /proc/[pid]/status , specifically this parameter.
Alternatively, you can use /usr/bin/time -v command. Here’s an example of its out:
Command exited with non-zero status 1 Command being timed: "xz -9ek access_log.3 access_log.xz" User time (seconds): 6.96 System time (seconds): 0.34 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.34 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 **Maximum resident set size (kbytes): 383456** Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 24000 Voluntary context switches: 3 Involuntary context switches: 225 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1
A difficulty with the /proc/[pid]/status approach is that it relies on the process being still running, which means that you can never use it to measure right to the end of the process’s execution. This is particularly problematic if you’re hoping to catch a process when it’s about to exhaust system ram, and it then might get oom killed before you can poll its memory use.
The RAM high-water mark information for a process is already collected for you by the kernel (from man proc ):
/proc/[pid]/status Provides much of the information in /proc/[pid]/stat and /proc/[pid]/statm in a format that's easier for humans to parse. (. ) * VmHWM: Peak resident set size ("high water mark"). (. )
The tricky part is that this value should be read an instant before the process terminates.
I tried different approaches (more on that at the end of the answer) and the one that worked for me was an implementation in C:
- logmemory invokes fork() to create a child process.
- The child process calls ptrace() so that the parent process (which is logmemory ) is notified every time the child executes a system call.
- The child process uses execvp() to run mycmd .
- logmemory patiently waits for a notification. When that’s the case, it checks whether mycmd invoked exit_group . If that’s the case, it reads /proc//status , copies the values to mem.log and detaches from the child. Otherwise, logmemory allows mycmd to continue and waits until the next notification.
The downside is that the ptrace() slows down the monitored program, I show some comparisons below.
This version of logmemory not only logs VmHWM but also:
- VmPeak (peak virtual memory size, which includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used)
- a timestamp
- the command name and arguments
This is the code, which can be surely improved — I’m not proficient in C. It works as intended, though (tested on a 32-bit Ubuntu 12.04 and a 64-bit SuSE Linux Enterprise Server 10 SP4):
// logmemory.c #include #include #include #include #include #include #include #include #define STRINGLENGTH 2048 int main(int argc, char **argv) < pid_t child_pid; long syscall; int status, index; FILE *statusfile, *logfile; char opt, statusfile_path[STRINGLENGTH], line[STRINGLENGTH], command[STRINGLENGTH], logfile_path[STRINGLENGTH] = ""; time_t now; extern char *optarg; extern int optind; // Error checking if (argc == 1) < printf("Error: program to execute is missing. Exiting. \n"); return 0; >// Get options while ((opt = getopt (argc, argv, "+o:")) != -1) switch (opt) < case 'o': strncpy(logfile_path, optarg, 2048); break; case ':': fprintf (stderr, "Aborting: argument for option -o is missing\n"); return 1; case '?': fprintf (stderr, "Aborting: only valid option is -o\n"); return 1; >// More error checking if (!strcmp(logfile_path, "")) < fprintf(stderr, "Error: log filename can't be empty\n"); return 1; >child_pid = fork(); // The child process executes this: if (child_pid == 0) < // Trace child process: ptrace(PTRACE_TRACEME, 0, NULL, NULL); // Execute command using $PATH execvp(argv[optind], (char * const *)(argv+optind)); // The parent process executes this: >else < // Loop until child process terminates do < // Set ptrace to stop when syscall is executed ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL); wait(&status); // Get syscall number syscall = ptrace(PTRACE_PEEKUSER, child_pid, #ifdef __i386__ 4 * ORIG_EAX, #else 8 * ORIG_RAX, #endif NULL); >while (syscall != SYS_exit_group); // Construct path to status file and check whether status and log file can be opened snprintf(statusfile_path, STRINGLENGTH, "/proc/%d/status", child_pid); if ( !(logfile = fopen(logfile_path, "a+")) || !(statusfile = fopen(statusfile_path, "r")) ) < ptrace(PTRACE_DETACH, child_pid, NULL, NULL); return 1; >// Copy timestamp and command to logfile now = time(NULL); fprintf(logfile, "Date: %sCmd: ", asctime(localtime(&now))); for (index = optind; index < argc; index++) fprintf(logfile, " %s", argv[index]); fprintf(logfile, "\n"); // Read status file line by line and copy lines containing VmPeak and VmHWM to logfile while (fgets(line, STRINGLENGTH, statusfile)) < if (strstr(line,"VmPeak") || strstr(line,"VmHWM")) fprintf(logfile, "%s", line); >fprintf(logfile, "\n"); // Close files fclose(statusfile); fclose(logfile); // Detach from child process ptrace(PTRACE_DETACH, child_pid, NULL, NULL); > return 0; >
Save it as logmemory.c and compile like this:
$ gcc logmemory.c -o logmemory
$ ./logmemory Error: program to execute is missing. Exiting. $ ./logmemory -o mem.log ls -l (. ) $ ./logmemory -o mem.log free total used free shared buffers cached Mem: 1025144 760660 264484 0 6644 143980 -/+ buffers/cache: 610036 415108 Swap: 1046524 544228 502296 $ ./logmemory -o mem.log find /tmp -name \*txt (. ) $ cat mem.log Date: Mon Feb 11 21:17:55 2013 Cmd: ls -l VmPeak: 5004 kB VmHWM: 1284 kB Date: Mon Feb 11 21:18:01 2013 Cmd: free VmPeak: 2288 kB VmHWM: 448 kB Date: Mon Feb 11 21:18:26 2013 Cmd: find /tmp -name *txt VmPeak: 4700 kB VmHWM: 908 kB
I wrote this C program to test logmemory ‘s accuracy:
// bigmalloc.c #include #include #include #define ITERATIONS 200 int main(int argc, char **argv) < int i=0; for (i=0; ireturn 0; >
Compile as usual and run it inside logmemory :
$ gcc bigmalloc.c -o bigmalloc $ ./logmemory -o mem.log ./bigmalloc $ tail mem.log Date: Mon Feb 11 21:26:01 2013 Cmd: ./bigmalloc VmPeak: 207604 kB VmHWM: 205932 kB
which correctly reports 200 MB used.
As a side note: time (at least on Ubuntu 12.04) surprisingly outputs a value that largely differs from what the kernel reports:
$ /usr/bin/time --format %M ./bigmalloc 823872
M Maximum resident set size of the process during its lifetime, in Kilobytes.
As mentioned above, this comes at a price, because logmemory slows down the execution of the monitored program, for example:
$ time ./logmemory -o mem.log ./bigmalloc real 0m0.288s user 0m0.000s sys 0m0.004s $ time ./bigmalloc real 0m0.104s user 0m0.008s sys 0m0.092s $ time find /var -name \*log (. ) real 0m0.036s user 0m0.000s sys 0m0.032s $ time ./logmemory -o mem.log find /var -name \*log (. ) real 0m0.124s user 0m0.000s sys 0m0.052s
Other approaches that I (unsuccessfully) tried were:
- A shell script that creates a background process to read /proc//status while mycmd runs.
- A C program that forks and exec’s mycmd but pauses until the child is a zombie, so avoiding ptrace and the overhead it creates. Nice idea, I thought, unfortunately VmHWM and VmPeak are no longer available from /proc//status for a zombie.
Syslog analysis: Memory problems
The performance of a server depends on its memory too. When the RAM and the swap space are full, the server runs out of memory. The next response by the kernel would be to kill the process that takes a lot of memory.The OOM killer (Out Of Memory) is the mechanism that the kernel uses to recover memory on the system. The primary objective of OOM killer is to kill the least number of processes while maximizing the memory space. As a result, it kills the process that uses the most memory first.
When a critical process is to be initiated and it requires more memory than what’s available, the kernel starts killing processes, and records these events with strings such as «Out of Memory» in the log data.
The occurrence of such events indicates that the server killed the process intentionally to free up memory.
While troubleshooting memory issues, spotting such events are essential as they help you to understand what process caused the memory problem.
Here are some examples of log data that denote memory issues
Jan 3 21:30:26 ip-172-31-34-37 kernel: [ 1575.404070] Out of memory: Kill process 16471 (memkiller) score 838 or sacrifice child
Jan 3 21:30:26 ip-172-31-34-37 kernel: [ 1575.408946] Killed process 16471 (memkiller) total-vm:144200240kB, anon-rss:562316kB, file-rss:0kB, shmem-rss:0kB
Jan 3 21:30:27 ip-172-31-34-37 kernel: [ 1575.518686] oom_reaper: reaped process 16471 (memkiller), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Memory issues can be resolved by analyzing the logs which are stored in the kernel log /var/log/kern.log or in the syslog /var/log/syslog location. You can manually analyze all the logs with the help of grep command and find out the cause of the memory issue. However, executing grep command again needs memory; so it is recommended to centrally store all your syslogs in a separate server and perform the analysis. You can manually group the processes and configure which process needs to be killed first and which crucial process needs to be kept running. But this is a time-consuming process as the number of logs generated will be high.
Alternatively, you can use a comprehensive log management solution such as EventLog Analyzer, to centralize all your syslogs and automatically analyze them for better insights . The solution offers real-time alerts and predefined reports for low diskspace, warning events, information events, etc.
A log management solution can be configured to trigger an alert when the system is running out of memory. This will help you to take immediate action so that crucial processes can be continued.
Check out how EventLog Analyzer can help you detect and resolve memory problems in the network. With 300+ predefined alert criteria, EventLog Analyzer can quickly identify security incidents and send real-time SMS or email notifications to the administrators.
Debug out-of-memory with /var/log/messages
Doesn’t matter if this problem is for httpd , mysqld or postfix but I am curious how can I continue debugging the problem. How can I get more info about why the PID 9163 is killed and I am not sure if linux keeps history for the terminated PIDs somewhere. If this occur in your message log file how you will troubleshoot this issue step by step?
# free -m total used free shared buffers cached Mem: 1655 934 721 0 10 52 -/+ buffers/cache: 871 784 Swap: 109 6 103`
2 Answers 2
The kernel will have logged a bunch of stuff before this happened, but most of it will probably not be in /var/log/messages , depending on how your (r)syslogd is configured. Try:
grep oom /var/log/* grep total_vm /var/log/*
The former should show up a bunch of times and the latter in only one or two places. That is the file you want to look at.
Find the original «Out of memory» line in one of the files that also contains total_vm . Thirty second to a minute (could be more, could be less) before that line you’ll find something like:
kernel: foobar invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
You should also find a table somewhere between that line and the «Out of memory» line with headers like this:
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
This may not tell you much more than you already know, but the fields are:
- pid The process ID.
- uid User ID.
- tgid Thread group ID.
- total_vm Virtual memory use (in 4 kB pages)
- rss Resident memory use (in 4 kB pages)
- nr_ptes Page table entries
- swapents Swap entries
- oom_score_adj Usually 0; a lower number indicates the process will be less likely to die when the OOM killer is invoked.
You can mostly ignore nr_ptes and swapents although I believe these are factors in determining who gets killed. This is not necessarily the process using the most memory, but it very likely is. For more about the selection process, see here. Basically, the process that ends up with the highest oom score is killed — that’s the «score» reported on the «Out of memory» line; unfortunately the other scores aren’t reported but that table provides some clues in terms of factors.
Again, this probably won’t do much more than illuminate the obvious: the system ran out of memory and mysqld was choosen to die because killing it would release the most resources. This does not necessary mean mysqld is doing anything wrong. You can look at the table to see if anything else went way out of line at the time, but there may not be any clear culprit: the system can run out of memory simply because you misjudged or misconfigured the running processes.