How to log the memory consumption on Linux?
Is there any ready-to-use solution to log the memory consumption from the start of the system? I’d like to log the data to simple text file or some database so I can analyze it later. I’m working on Linux 2.4-based embedded system. I need to debug the problem related to memory consumption. My application automatically start on every system start. I need the way to get the data with timestamps from regular intervals (as often as possible), so I can track down problem. The symptoms of my problem: when system starts it launched my main application and GUI to visualize the main parameters of the system. GUI based on GTK+ (X server). If I disable GUI and X server then my application works OK. If I enable GUI and X server it does not work when I have 256 MiB or 512 MiB of physical memory installed on the motherboard. If I have 1 GiB of memory installed then everything is OK.
Yes, previous versions runs OK in the same system. Now we develop new version, and start hitting this problem.
8 Answers 8
The following script prints time stamps and a header.
#!/bin/bash -e echo " date time $(free -m | grep total | sed -E 's/^ (.*)/\1/g')" while true; do echo "$(date '+%Y-%m-%d %H:%M:%S') $(free -m | grep Mem: | sed 's/Mem://g')" sleep 1 done
The output looks like this (tested on Ubuntu 15.04, 64-bit).
date time total used free shared buffers cached 2015-08-01 13:57:27 24002 13283 10718 522 693 2308 2015-08-01 13:57:28 24002 13321 10680 522 693 2308 2015-08-01 13:57:29 24002 13355 10646 522 693 2308 2015-08-01 13:57:30 24002 13353 10648 522 693 2308
rm memory.log while true; do free >> memory.log; sleep 1; done
free -s 1 > memory.log ; this doesn’t incur in the costs of launching a new process every second. It doesn’t have timestamping, though. (I know this is an old post, but I was looking for the same thing, and so will others).
In some versions of free it is needed that you specify the number of times you want to print with -c due to a bug. Otherwise it shows the error free: seconds argument `1′ failed
.@hirschhornsalz — Is it possible to read any specific sysfs and then log memory usage? I am using a program written in C and would like to log using it with minimal overhead?
I am a big fan of logging everything and I find it useful to know which processes are using the memory and how much each process is using (as well as sumary statistics). The following command records a top printout ordered by memory consumption every 0.5 seconds:
top -bd0.5 -o +%MEM > memory.log
Just note that the log file will grow a lot faster than if you only store the total memory utilization statistics so be sure you don’t run out of disk space.
on *nix systems. You could try to use that to monitor memory usage. It takes measurements at regular intervals. Do a
for more details. I think the option is -r for taking memory measurements, -i to specify the interval you’d like.
Definitely more powerful than the other proposed solutions, although a bit more cumbersome to set up.
e.g. sar -r 2 5 for reporting every two seconds five times. It also allows you to export and read output file in binary format.
I think adding a crontab entry will be enough
*/5 * * * * free -m >> some_output_file
There are other tools like SeaLion, New Relic, Server Density etc which will almost do the same but are much easier to install and configure. My favorite is SeaLion, as it being free and also it gives a awesome timeline view of raw outputs of common linux commands.
+1 for Sealion, teh fastest signup/setup EVER. The whole thing I think took me about 6 seconds — I typed in my email, a password. Pasted a single command into ssh and boom my stats appeared.
You could put something like
into a startup script. Since your application is already in startup you could just add this line to the end of the initialization script your application is already using. (where X is # of seconds between log messages)
To periodically log the memory usage efficiently, I combined another answer here with a method to only retain the top-K memory-using processes.
top -bd 1.5 -o +%MEM | grep «load average» -A 9 > memory_usage.log
This command will record, every 1.5s, the top header information and the 3 highest memory-consuming processes (there’s a 6-line offset for top ‘s header information). This saves lots of disk space over recording top ‘s information for every process.
So I know that I am late to this game, but I just came up with this answer, as I needed to do this, and really didn’t want the extra fields that vmstat , free , etc. all will seem to output without excess filtering. So here is the answer that I came up with:
top -bd 0.1 | grep 'KiB Mem' | cut -d' ' -f10 > memory.txt
top -bd 0.1 | grep 'KiB Mem' | cut -d' ' -f10 | tee memory.txt
the standard output from top when grep ing with Kib Mem is:
KiB Mem : 16047368 total, 8708172 free, 6015720 used, 1323476 buff/cache
By running this through cut, we filter down to literally just the number prior to used
The user can indeed modify the 0.1 to another number in order to run different capture sample rates. In my case I wanted to use top also because you can run memory stats faster than 1 second per capture, as you can see here I wanted to capture a stat every 1/10th of a second.
NOTES: It does turn out that piping through cut cause MASSIVE delay in getting anything out to file. As we later found out, it is much faster to leave out the cut command during data acquisition, then perform the cut command on the output file later. Also, we had no need for timestamps in our tests.
This thus looks as follows:
Begin Logging:
top -bd 0.1 | grep 'KiB Mem' | tee memory_raw.txt
Exit Logging:
2 levels of cut (filtering), first by comma, then by space. This is due to the alignment of top and provides much cleaner output:
cut memory_raw -d',' -f3 | tee memory_used_withlabel.txt cut memory_used_withlabel.txt -d' ' -f3 | tee memory_used.txt
Syslog analysis: Memory problems
The performance of a server depends on its memory too. When the RAM and the swap space are full, the server runs out of memory. The next response by the kernel would be to kill the process that takes a lot of memory.The OOM killer (Out Of Memory) is the mechanism that the kernel uses to recover memory on the system. The primary objective of OOM killer is to kill the least number of processes while maximizing the memory space. As a result, it kills the process that uses the most memory first.
When a critical process is to be initiated and it requires more memory than what’s available, the kernel starts killing processes, and records these events with strings such as «Out of Memory» in the log data.
The occurrence of such events indicates that the server killed the process intentionally to free up memory.
While troubleshooting memory issues, spotting such events are essential as they help you to understand what process caused the memory problem.
Here are some examples of log data that denote memory issues
Jan 3 21:30:26 ip-172-31-34-37 kernel: [ 1575.404070] Out of memory: Kill process 16471 (memkiller) score 838 or sacrifice child
Jan 3 21:30:26 ip-172-31-34-37 kernel: [ 1575.408946] Killed process 16471 (memkiller) total-vm:144200240kB, anon-rss:562316kB, file-rss:0kB, shmem-rss:0kB
Jan 3 21:30:27 ip-172-31-34-37 kernel: [ 1575.518686] oom_reaper: reaped process 16471 (memkiller), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Memory issues can be resolved by analyzing the logs which are stored in the kernel log /var/log/kern.log or in the syslog /var/log/syslog location. You can manually analyze all the logs with the help of grep command and find out the cause of the memory issue. However, executing grep command again needs memory; so it is recommended to centrally store all your syslogs in a separate server and perform the analysis. You can manually group the processes and configure which process needs to be killed first and which crucial process needs to be kept running. But this is a time-consuming process as the number of logs generated will be high.
Alternatively, you can use a comprehensive log management solution such as EventLog Analyzer, to centralize all your syslogs and automatically analyze them for better insights . The solution offers real-time alerts and predefined reports for low diskspace, warning events, information events, etc.
A log management solution can be configured to trigger an alert when the system is running out of memory. This will help you to take immediate action so that crucial processes can be continued.
Check out how EventLog Analyzer can help you detect and resolve memory problems in the network. With 300+ predefined alert criteria, EventLog Analyzer can quickly identify security incidents and send real-time SMS or email notifications to the administrators.
Debug out-of-memory with /var/log/messages
Doesn’t matter if this problem is for httpd , mysqld or postfix but I am curious how can I continue debugging the problem. How can I get more info about why the PID 9163 is killed and I am not sure if linux keeps history for the terminated PIDs somewhere. If this occur in your message log file how you will troubleshoot this issue step by step?
# free -m total used free shared buffers cached Mem: 1655 934 721 0 10 52 -/+ buffers/cache: 871 784 Swap: 109 6 103`
2 Answers 2
The kernel will have logged a bunch of stuff before this happened, but most of it will probably not be in /var/log/messages , depending on how your (r)syslogd is configured. Try:
grep oom /var/log/* grep total_vm /var/log/*
The former should show up a bunch of times and the latter in only one or two places. That is the file you want to look at.
Find the original «Out of memory» line in one of the files that also contains total_vm . Thirty second to a minute (could be more, could be less) before that line you’ll find something like:
kernel: foobar invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
You should also find a table somewhere between that line and the «Out of memory» line with headers like this:
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
This may not tell you much more than you already know, but the fields are:
- pid The process ID.
- uid User ID.
- tgid Thread group ID.
- total_vm Virtual memory use (in 4 kB pages)
- rss Resident memory use (in 4 kB pages)
- nr_ptes Page table entries
- swapents Swap entries
- oom_score_adj Usually 0; a lower number indicates the process will be less likely to die when the OOM killer is invoked.
You can mostly ignore nr_ptes and swapents although I believe these are factors in determining who gets killed. This is not necessarily the process using the most memory, but it very likely is. For more about the selection process, see here. Basically, the process that ends up with the highest oom score is killed — that’s the «score» reported on the «Out of memory» line; unfortunately the other scores aren’t reported but that table provides some clues in terms of factors.
Again, this probably won’t do much more than illuminate the obvious: the system ran out of memory and mysqld was choosen to die because killing it would release the most resources. This does not necessary mean mysqld is doing anything wrong. You can look at the table to see if anything else went way out of line at the time, but there may not be any clear culprit: the system can run out of memory simply because you misjudged or misconfigured the running processes.