Linux module with proc
On The first post we built a Simple Kernel Module with init and exit functions and covered the basic concepts in kernel programming
Next, We added a Kernel Module Parameters to configure the kernel module data
In this post, We will create the first interface to user space application using procfs (/proc) file
Proc File System
Proc is a pseudo file system for interfacing with the kernel internal data structures. As a user, you can use proc files for system diagnostics – CPU, memory, Interrupts and many more. You can also configure a lot of parameters like scheduler parameters, kernel objects, memory and more
The common interaction with proc is using cat and echo from the shell. For example:
# cat /proc/cpuinfo # echo "50"> /proc/sys/kernel/sched_rr_timeslice_ms
Creating a new Proc file
To create a proc file system we need to implement a simple interface – file_operation.
We can implement more than 20 functions but the common operations are read, write. To register the interface use the function proc_create
#include #include #include #include #include #include #define BUFSIZE 100 MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Liran B.H"); static struct proc_dir_entry *ent; static ssize_t mywrite(struct file *file, const char __user *ubuf,size_t count, loff_t *ppos) < printk( KERN_DEBUG "write handler\n"); return -1; >static ssize_t myread(struct file *file, char __user *ubuf,size_t count, loff_t *ppos) < printk( KERN_DEBUG "read handler\n"); return 0; >static struct file_operations myops = < .owner = THIS_MODULE, .read = myread, .write = mywrite, >; static int simple_init(void) < ent=proc_create("mydev",0660,NULL,&myops); return 0; >static void simple_cleanup(void) < proc_remove(ent); >module_init(simple_init); module_exit(simple_cleanup);
If you build and insert the module, you will see a new file /proc/mydev , You can test the read and write operations using cat and echo (only see the kernel log messages)
# echo "test" > /proc/mydev bash: echo: write error: Operation not permitted # cat /proc/mydev # dmesg | tail -2 [ 694.640306] write handler [ 714.661465] read handler
Implementing The Read Handler
The read handler receives 4 parameters:
- File Object – per process structure with the opened file details (permission , position, etc.)
- User space buffer
- Buffer size
- Requested position (in and out parameter)
To implement the read callback we need to:
For example, the user run the following code:
int fd = open("/proc/mydev", O_RDWR); len = read(fd,buf,100); len = read(fd,buf,50);
On the first call to read we get the user buffer, size = 100, position = 0 , we need to fill the buffer with up to 100 bytes from position 0, update the position and return the number of bytes we wrote. If we filled the buffer with 100 bytes and returned 100 the next call to read we get the user buffer, size=50 and position=100
Suppose we have 2 module parameters and we want to return their values on proc read handler we write the following:
static int irq=20; module_param(irq,int,0660); static int mode=1; module_param(mode,int,0660); static ssize_t myread(struct file *file, char __user *ubuf,size_t count, loff_t *ppos) < char buf[BUFSIZE]; int len=0; printk( KERN_DEBUG "read handler\n"); if(*ppos >0 || count
This is a simple implementation, We check if this is the first time we call read (pos=0) and the user buffer size is bigger than BUFSIZE , otherwise we return 0 (end of file)
Then, we build the returned buffer, copy it to the user , update the position and return the number we wrote
Build and insert the module, you can test it with cat command:
# sudo insmod ./simproc.ko irq=32 mode=4 # cat /proc/mydev irq = 32 mode = 4
Exchanging Data With User-Space
In the kernel code, you can’t just use memcpy between an address supplied by user-space and the address of a buffer in kernel-space:
- Correspond to completely different address spaces (thanks to virtual memory).
- The user-space address may be swapped out to disk.
- The user-space address may be invalid (user space process trying to access unauthorized data).
You must use dedicated functions in your read and write file operations code:
include unsigned long copy_to_user(void __user *to,const void *from, unsigned long n); unsigned long copy_from_user(void *to,const void __user *from,unsigned long n);
Implementing the Write Handler
The write handler is similar to the read handler. The only difference is that the user buffer type is a const char pointer. We need to copy the data from the user buffer to the requested position and return the number of bytes copied
In this example we want to set both values using a simple command:
The first value is the irq number and the second is the mode.
The code for the write handler:
static ssize_t mywrite(struct file *file, const char __user *ubuf,size_t count, loff_t *ppos) < int num,c,i,m; char buf[BUFSIZE]; if(*ppos >0 || count > BUFSIZE) return -EFAULT; if(copy_from_user(buf,ubuf,count)) return -EFAULT; num = sscanf(buf,"%d %d",&i,&m); if(num != 2) return -EFAULT; irq = i; mode = m; c = strlen(buf); *ppos = c; return c; >
Again, we check if this is the first time we call write (position 0) , then we use copy_from_user to memcpy the data from the user address space to the kernel address space. We extract the values, check for errors , update the position and return the number of bytes we received
#include #include #include #include #include #include #define BUFSIZE 100 MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Liran B.H"); static int irq=20; module_param(irq,int,0660); static int mode=1; module_param(mode,int,0660); static struct proc_dir_entry *ent; static ssize_t mywrite(struct file *file, const char __user *ubuf, size_t count, loff_t *ppos) < int num,c,i,m; char buf[BUFSIZE]; if(*ppos >0 || count > BUFSIZE) return -EFAULT; if(copy_from_user(buf, ubuf, count)) return -EFAULT; num = sscanf(buf,"%d %d",&i,&m); if(num != 2) return -EFAULT; irq = i; mode = m; c = strlen(buf); *ppos = c; return c; > static ssize_t myread(struct file *file, char __user *ubuf,size_t count, loff_t *ppos) < char buf[BUFSIZE]; int len=0; if(*ppos >0 || count < BUFSIZE) return 0; len += sprintf(buf,"irq = %d\n",irq); len += sprintf(buf + len,"mode = %d\n",mode); if(copy_to_user(ubuf,buf,len)) return -EFAULT; *ppos = len; return len; >static struct file_operations myops = < .owner = THIS_MODULE, .read = myread, .write = mywrite, >; static int simple_init(void) < ent=proc_create("mydev",0660,NULL,&myops); printk(KERN_ALERT "hello. \n"); return 0; >static void simple_cleanup(void) < proc_remove(ent); printk(KERN_WARNING "bye . \n"); >module_init(simple_init); module_exit(simple_cleanup);
Note : to implement more complex proc entries , use the seq_file wrapper
User Space Application
You can open the file and use read/write functions to test the module. Don’t forget to move the position bask to 0 after each operation:
#include #include #include #include #include void main(void) < char buf[100]; int fd = open("/proc/mydev", O_RDWR); read(fd, buf, 100); puts(buf); lseek(fd, 0 , SEEK_SET); write(fd, "33 4", 5); lseek(fd, 0 , SEEK_SET); read(fd, buf, 100); puts(buf); >
You can find the full source code here
5.1. The /proc File System
In Linux, there is an additional mechanism for the kernel and kernel modules to send information to processes — the /proc file system. Originally designed to allow easy access to information about processes (hence the name), it is now used by every bit of the kernel which has something interesting to report, such as /proc/modules which provides the list of modules and /proc/meminfo which stats memory usage statistics.
The method to use the proc file system is very similar to the one used with device drivers — a structure is created with all the information needed for the /proc file, including pointers to any handler functions (in our case there is only one, the one called when somebody attempts to read from the /proc file). Then, init_module registers the structure with the kernel and cleanup_module unregisters it.
The reason we use proc_register_dynamic [1] is because we don’t want to determine the inode number used for our file in advance, but to allow the kernel to determine it to prevent clashes. Normal file systems are located on a disk, rather than just in memory (which is where /proc is), and in that case the inode number is a pointer to a disk location where the file’s index-node (inode for short) is located. The inode contains information about the file, for example the file’s permissions, together with a pointer to the disk location or locations where the file’s data can be found.
Because we don’t get called when the file is opened or closed, there’s nowhere for us to put try_module_get and try_module_put in this module, and if the file is opened and then the module is removed, there’s no way to avoid the consequences.
Here a simple example showing how to use a /proc file. This is the HelloWorld for the /proc filesystem. There are three parts: create the file /proc/helloworld in the function init_module , return a value (and a buffer) when the file /proc/helloworld is read in the callback function procfs_read , and delete the file /proc/helloworld in the function cleanup_module .
The /proc/helloworld is created when the module is loaded with the function create_proc_entry . The return value is a ‘struct proc_dir_entry *’, and it will be used to configure the file /proc/helloworld (for example, the owner of this file). A null return value means that the creation has failed.
Each time, everytime the file /proc/helloworld is read, the function procfs_read is called. Two parameters of this function are very important: the buffer (the first parameter) and the offset (the third one). The content of the buffer will be returned to the application which read it (for example the cat command). The offset is the current position in the file. If the return value of the function isn’t null, then this function is called again. So be careful with this function, if it never returns zero, the read function is called endlessly.
% cat /proc/helloworld HelloWorld!
/* * procfs1.c — create a «file» in /proc * */ #include /* Specifically, a module */ #include /* We’re doing kernel work */ #include /* Necessary because we use the proc fs */ #define procfs_name «helloworld» /** * This structure hold information about the /proc file * */ struct proc_dir_entry *Our_Proc_File; /* Put data into the proc fs file. * * Arguments * ========= * 1. The buffer where the data is to be inserted, if * you decide to use it. * 2. A pointer to a pointer to characters. This is * useful if you don’t want to use the buffer * allocated by the kernel. * 3. The current position in the file * 4. The size of the buffer in the first argument. * 5. Write a «1» here to indicate EOF. * 6. A pointer to data (useful in case one common * read for multiple /proc/. entries) * * Usage and Return Value * ====================== * A return value of zero means you have no further * information at this time (end of file). A negative * return value is an error condition. * * For More Information * ==================== * The way I discovered what to do with this function * wasn’t by reading documentation, but by reading the * code which used it. I just looked to see what uses * the get_info field of proc_dir_entry struct (I used a * combination of find and grep, if you’re interested), * and I saw that it is used in /fs/proc/array.c. * * If something is unknown about the kernel, this is * usually the way to go. In Linux we have the great * advantage of having the kernel source code for * free — use it. */ int procfile_read(char *buffer, char **buffer_location, off_t offset, int buffer_length, int *eof, void *data) < int ret; printk(KERN_INFO "procfile_read (/proc/%s) called\n", procfs_name); /* * We give all of our information in one go, so if the * user asks us if we have more information the * answer should always be no. * * This is important because the standard read * function from the library would continue to issue * the read system call until the kernel replies * that it has no more information, or until its * buffer is filled. */ if (offset >0) < /* we have finished to read, return 0 */ ret = 0; >else < /* fill the buffer, return the buffer size */ ret = sprintf(buffer, "HelloWorld!\n"); >return ret; > int init_module() < Our_Proc_File = create_proc_entry(procfs_name, 0644, NULL); if (Our_Proc_File == NULL) < remove_proc_entry(procfs_name, &proc_root); printk(KERN_ALERT "Error: Could not initialize /proc/%s\n", procfs_name); return -ENOMEM; >Our_Proc_File->read_proc = procfile_read; Our_Proc_File->owner = THIS_MODULE; Our_Proc_File->mode = S_IFREG | S_IRUGO; Our_Proc_File->uid = 0; Our_Proc_File->gid = 0; Our_Proc_File->size = 37; printk(KERN_INFO «/proc/%s created\n», procfs_name); return 0; /* everything is ok */ > void cleanup_module() < remove_proc_entry(procfs_name, &proc_root); printk(KERN_INFO "/proc/%s removed\n", procfs_name); >
Notes
In version 2.0, in version 2.2 this is done automatically if we set the inode to zero.