How to tune the Linux kernel with the /proc filesystem
The Linux kernel is a tunable marvel that allows you to make changes to its parameters while it is running and without requiring a reboot.
Linux is an amazing and powerful operating system. More specifically, the Linux kernel is the source of many of its superpowers. I have been using Linux for 25 years and have used a lot of versions of the Linux kernel. I have even compiled the kernel a time or two in class and a few times just for grins.
Career advice
Most users and even sysadmins never need to compile the kernel. In most distributions, the default compile is perfectly fine for most use cases. But there are times when a bit of tuning is in order. The good news is that the kernel can be tuned easily without recompiling it or even rebooting.
This article is not intended to be a detailed exposure of all the kernel data available for viewing or modification. It is an homage to an incredible piece of software on its 31st anniversary.
What is the /proc filesystem?
The /proc filesystem is one of the most critical components of the kernel. It is a virtual filesystem that exists only in memory. The /proc filesystem is defined by the Linux Filesystem Hierarchical Standard (FHS) as the location for Linux to store information about the system, the kernel, and all processes running on the host. It is intended to be a place for the kernel to expose information about itself to facilitate access to data about the system for programmers, developers, and sysadmins.
Collecting this data does not impact the overall performance of a Linux host. The Linux kernel is designed to continuously collect and store the performance data that can be accessed and displayed by any and all performance monitoring tools. The tools access that data to read it and then manipulate and display it in a meaningful format. Because this data is already stored in the /proc filesystem, it avoids complex and time-consuming function calls to the kernel’s internals.
Some of the data in the /proc filesystem is used to tune the kernel. The values in those files can be easily changed by simple and familiar Linux tools.
Viewing the data
When used as a window into the state of the operating system and its view of the system and hardware, the kernel provides easy access to virtually every bit of information you might want as a sysadmin. All the cool tools that sysadmins use to access the data in the /proc filesystem depend on the kernel to view the status of the operating system and obtain needed data.
Start by viewing some of the available data. First, make /proc the present working directory (PWD) and list the contents. Many of these are directories and others are files. I use the Konsole terminal emulator. The default color for directories is blue, and the files are the terminal text color, which I set to amber. Symbolic links are cyan.
Linux Kernel Tuning via /proc system
Linux copied Solaris /proc system and extended its functionality as the performance tuning center. Tuning is usually accomplished via system control variables stored in /proc/sys . Unlike most other areas of /proc, those variables are typically writable, and are used to adjust the running kernel rather than simply monitor currently running processes and system information.
The sysctl interface allows administrators to modify variables that the kernel uses to determine behavior. There are two ways to work with sysctl: by directly reading and modifying files in /proc/sys and by using the sysctl program supplied with most distributions. Most documentation on sysctl accesses variables via the /proc/sys file system, and does so using cat for viewing and echo for changing variables, as shown in the following example where IP forwarding is enabled:
# cat /proc/sys/net/ipv4/ip_forward 0 # echo "1" > /proc/sys/net/ipv4/ip_forward # cat /proc/sys/net/ipv4/ip_forward 1
An alternative is to use the sysctl program, which provides an interface to accessing sysctl. With the sysctl program, you specify a path to the variable, with /proc/sys being the base.
To start getting a taste of what sysctl can modify, run sysctl -a and you will see all the possible parameters. The list can be quite long: in my current box there are 712 possible settings.
sysctl vm will list all variables that start with «vm.»
The -n option to output just the variable values, without the names; -N has the opposite effect, and produces the names but not the values.
You can change any variable by using the -w option with the syntax sysctl -w variable =value . For example, sysctl -w net.ipv6.conf.all.forwarding=1 sets the corresponding variable to true (0 equals «no» or «false»; 1 means «yes» or «true») thus allowing IP6 forwarding.
For more information, run man sysctl and the post Using Sysctl To Change Kernel Tunables On Linux:
. sysctl is a very versatile command and can be used either in its standalone form, or through the modification of the /etc/sysctl.conf file. First, we’ll take a brief look at what the sysctl standalone command can do. It doesn’t have too many options, so explaining them quickly upfront will make the rest seem like it makes more sense 😉 You can run sysctl with the following flags (maybe more, depending on your distro):
-a to display all the tunable key values currently available
-A to display all the tunable key values currently available, as well as table values
-e to ignore errors (specifically pertaining to unrecognized characters)
-n to «not» print the key names when printing out values
-N to «only» print the key names and forgo printing their values
-p (sometimes -P) to import and apply settings from a specified file. This option will use /etc/sysctl.conf as the default if no file name argument is provided on the command line
-q for your standard quiet mode
-w to change kernel tunable (sysctl) settings — This will make the change in real time, as well as update the /etc/sysctl.conf file
and two more «special» arguments:
Some basic examples of sysctl’s use would include:
SPECIAL ERRATA NOTICE: If sysctl -a spews a lot of kernel warnings, check out Advisory RHBA-2008:0020-4 on RedHat’s website for a patch to fix that issue.
It’s interesting, also, to note that, while sysctl will work just fine with an /etc/sysctl.conf file that includes nothing but comments (or is completely non-existent), your /proc filesystem «must» be of the type «procfs» in order for it to function correctly. This is picking a nit, really, since you’d have to go out of your way to build your RedHat Linux box to use (for instance) ext3 for the /proc filesystem, but a bit of information that’s good to know (maybe. at some point in the future 😉 /proc/sys is the base directory for sysctl. In fact, if you wanted to emulate «sysctl -a«, you could just do an ls in that directory.
Tomorrow, or sometime later this week, we’ll take a look at some of the kernel tunables you’ll probably want to change, or may have to modify, most often with sysctl and, with as even a hand as possible, debate the pro’s and con’s of some of the more «impactful» values that you can mess with.
Your browser does not support iframes.
NEWS CONTENTS
Old News
[Sep 09, 2008] Linux.com Kernel tuning with sysctl by Federico Kereki
The Linux kernel is flexible, and you can even modify the way it works on the fly by dynamically changing some of its parameters, thanks to the sysctl command. Sysctl provides an interface that allows you to examine and change several hundred kernel parameters in Linux or BSD. Changes take effect immediately, and there’s even a way to make them persist after a reboot. By using sysctl judiciously, you can optimize your box without having to recompile your kernel, and get the results immediately.
To start getting a taste of what sysctl can modify, run sysctl -a and you will see all the possible parameters. The list can be quite long: in my current box there are 712 possible settings.
If you want to get the value of just a single variable, use something like sysctl vm.swappiness , or just sysctl vm to list all variables that start with «vm.» Add the -n option to output just the variable values, without the names; -N has the opposite effect, and produces the names but not the values.
You can change any variable by using the -w option with the syntax sysctl -w variable=value . For example, sysctl -w net.ipv6.conf.all.forwarding=1 sets the corresponding variable to true (0 equals «no» or «false»; 1 means «yes» or «true») thus allowing IP6 forwarding. You may not even need the -w option — it seems to be deprecated. Do some experimenting on your own to confirm that.
For more information, run man sysctl to display the standard documentation.
sysctl and the /proc directory
The /proc/sys virtual directory also provides an interface to the sysctl parameters, allowing you to examine and change them. For example, the /proc/sys/vm/swappiness file is equivalent to the vm.swappiness parameter in sysctl.conf; just forget the initial «/proc/sys/» part, substitute dots for the slashes, and you get the corresponding sysctl parameter. (By the way, the substitution is not actually required; slashes are also accepted, though it seems everybody goes for the notation with the dots instead.) Thus, echo 10 >/proc/sys/vm/swappiness is exactly the same as sysctl -w vm.swappiness=10 . But as a rule of thumb, if a /proc/sys file is read-only, you cannot set it with sysctl either.
sysctl values are loaded at boot time from the /etc/sysctl.conf file. This file can have blank lines, comments (lines starting either with a «#» character or a semicolon), and lines in the «variable=value» format. For example, my own sysctl.conf file is listed below. If you want to apply it at any time, you can do so with the command sysctl -p .
# Disable response to broadcasts. net.ipv4.icmp_echo_ignore_broadcasts = 1 # enable route verification on all interfaces net.ipv4.conf.all.rp_filter = 1 # enable ipV6 forwarding net.ipv6.conf.all.forwarding = 1 # increase the number of possible inotify(7) watches fs.inotify.max_user_watches = 65536
Getting somewhere?
With so many tunable parameters, how do you decide what to do? Alas, this is a sore point with sysctl: most of the relevant documentation is hidden in the many source files of the Linux kernel, and isn’t easily available, and it doesn’t help that the explanations given are sometime arcane and difficult to understand. You may find something in the /usr/src/linux/Documentation/sysctl directory, but most (if not all) files there refer to kernel 2.2, and seemingly haven’t been updated in the last several years.
Looking around for books on the subject probably won’t help much. I found hack #71 in O’Reilly’s Linux Server Hacks, Volume 2, from 2005, but that was about it. Several other books include references to sysctl, but as to specific parameters or hints, you are on your own.
As an experiment, I tried looking for information on the swappiness parameter, which can optimize virtual memory management. The /usr/src/Linux/Documentation/sysctl/vm.txt file didn’t even refer to it, probably because this parameter appeared around version 2.6 of the kernel. Doing a general search in the complete /usr/src/linux directory turned up five files that mention «swappiness»: three «include» (.h) files in include/linux, plus kernel/sysctl.c and mm/vmscan.c. The latter file included the information:
/* * From 0 .. 100. Higher means more swappy. */ int vm_swappiness = 60;
That was it! You can see the default value (60) and a minimal reference to the field meaning. How helpful is that?
My suggestion would be to use sysctl -a to learn the available parameters, then Google around for extra help. You may find, say, an example of changing the shared memory allocation to solve a video program problem, or an explanation on vm.swappiness, or even more suggestions for optimizing IP4 network traffic.
sysctl shows yet another aspect of the great flexibility of Linux systems. While documentation for it is not widely available, learning its features and capabilities on your own can help you get even more performance out of your box. That’s system administration at its highest (or lowest?) level.
—>
Read in the original layout at: http://www.linux.com/feature/146599
Kernel tuning with sysctl
Posted by: Anonymous [ip: 96.14.205.198] on September 09, 2008 05:26 PM
There won’t be a whole book on sysctl anytime soon—it would be about 14 pages, including title, copyright, TOC, and index. The actual parameters have changed dramatically over time, not only in availability, but also in interpretation. A setting that’s valid in a 2.4 kernel might be gone in 2.6, or might have a different set of valid values.