Read/write files within a Linux kernel module
I know all the discussions about why one should not read/write files from kernel, instead how to use /proc or netlink to do that. I want to read/write anyway. I have also read Driving Me Nuts — Things You Never Should Do in the Kernel. However, the problem is that 2.6.30 does not export sys_read() . Rather it’s wrapped in SYSCALL_DEFINE3 . So if I use it in my module, I get the following warnings:
WARNING: "sys_read" [xxx.ko] undefined! WARNING: "sys_open" [xxx.ko] undefined!
- How to read/write within kernel after 2.6.22 (where sys_read() / sys_open() are not exported)?
- In general, how to use system calls wrapped in macro SYSCALL_DEFINEn() from within the kernel?
2 Answers 2
You should be aware that you should avoid file I/O from within Linux kernel when possible. The main idea is to go «one level deeper» and call VFS level functions instead of the syscall handler directly:
#include #include #include #include
Opening a file (similar to open):
struct file *file_open(const char *path, int flags, int rights) < struct file *filp = NULL; mm_segment_t oldfs; int err = 0; oldfs = get_fs(); set_fs(get_ds()); filp = filp_open(path, flags, rights); set_fs(oldfs); if (IS_ERR(filp)) < err = PTR_ERR(filp); return NULL; >return filp; >
Close a file (similar to close):
void file_close(struct file *file)
Reading data from a file (similar to pread):
int file_read(struct file *file, unsigned long long offset, unsigned char *data, unsigned int size)
Writing data to a file (similar to pwrite):
int file_write(struct file *file, unsigned long long offset, unsigned char *data, unsigned int size)
Syncing changes a file (similar to fsync):
int file_sync(struct file *file)
[Edit] Originally, I proposed using file_fsync, which is gone in newer kernel versions. Thanks to the poor guy suggesting the change, but whose change was rejected. The edit was rejected before I could review it.
Thank you. I was thinking to do something similar by replicating sys_read/sys_open functionality. But this is great help. A curiosity, is there any way to use system calls declared using SYSCALL_DEFINE?
I tried this code in kernel 2.6.30 (Ubuntu 9.04) and reading the file crashes the system. Anyone experienced the same issue?
@Enrico Detoma? Oh, wow. This there any way that you can give me the module you used? Never seen that before?
That immediately raise the question of «why are you doing that FS dance, btw», which is answered quite nicely here: linuxjournal.com/node/8110/print under «Fixing the Address Space» section.
Since version 4.14 of Linux kernel, vfs_read and vfs_write functions are no longer exported for use in modules. Instead, functions exclusively for kernel’s file access are provided:
# Read the file from the kernel space. ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos); # Write the file from the kernel space. ssize_t kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos);
Also, filp_open no longer accepts user-space string, so it can be used for kernel access directly (without dance with set_fs ).
What alternatives do we have other than filp_open ? I know we shouldn’t do file operations on userspace files through the kernel, but let’s say we want to.
When I try to open a userspace file, it just fails claiming it was trying to derefernce a null pointer probably because it cannot access a file that is placed in userspace area. E.g ~/.text
There is no such thing like a «file placed in userspace area». All files (including the ones under ~ ) are stored in the single namespace. But the ~ is the concept of the shell: this character is not processed by the kernel and non-shell programs. The kernel is even not aware about a user’s home directory: this concept is maintained by user space part of OS. For access a file under user’s home directory from the kernel you need to specify that directory as «normal» path. E.g. /home/tester/.text .
yea I mean ~/.text was just an example, any other abs path that I provide doesn’t seem to work at all, after debugging the kernel seems to abort with dereferencing a null pointer and it is indeed the first argument that causes it, but if you say so then ok.
Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The reputation requirement helps protect this question from spam and non-answer activity.
Read file in linux kernel
In Linux kernel development, usually in the process of kernel development in the embedded field, it is inevitable that there will be a need to access files in the file system.
But there is no file IO and standard IO in the Linux kernel like in user mode, which can directly open()/fopen(), read()/fread(), write()/fwrite(), close()/ fclose() operation.
Fortunately, there are corresponding functions provided in ./kernel/include/linux/fs.h for us to perform IO operations on ordinary files in the file system.
This set of functions can be understood as a «file IO» interface in the kernel mode. To
1. Filp_open() function
The function prototype is as follows:
struct file *filp_open(const char *, int, umode_t);
Parameter 1 is the path of the file to be opened. Just fill in the path in the file system directly, it is best to fill in the absolute path.
Parameter 2 is the read and write mode of the file. Commonly used values are O_RDONLY, O_WDONLY, O_RDWR, O_CREAT. The value of this parameter is the same as in the file IO, and they are defined in ./kernel/include/uapi/asm-generic/fcntl.h.
Parameter 3 is the permission of the file, which is an octal value in the form of 0666 and 0755. If it is in read-only mode, just fill in 0 directly.
The return value is a pointer to the structure of the opened file. This structure is defined in ./kernel/include/linux/fs.h.
2. Filp_close() function
The function prototype is as follows:
int filp_close(struct file *, fl_owner_t id);
Parameter 1 is the return value of the filp_open() function.
Parameter 2 is generally filled with 0.
The return value indicates the closing result of this file, and the value 0 indicates successful closing.
3. The vfs_read() function
The function prototype is as follows:
ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
Parameter 1 is the return value of the filp_open() function.
Parameter 2 is an array used to store the content read. It should be noted here that this parameter is to be used in the character array applied in user mode by default. If you have to use the character array applied in the kernel mode, another operation is required, which is recorded below.
Parameter 3 indicates the maximum size expected to be read.
Parameter 4 indicates the reading position, which is used to record the length of the data read during this reading. It can be understood as a positioning, a ruler, which is set in order to be connected to the end of the previous reading in the next reading.
The return value is the size of the data actually read.
If parameter 2 is to directly use the character array applied in the kernel mode, the following code must be executed before calling this function:
mm_segment_t old_fs; old_fs = get_fs(); set_fs(KERNEL_DS);
And execute the following code after reading:
If you don’t do this and pass the space requested in the kernel mode directly to parameter 2, the vfs_read() function will directly return an error code of -14. This error code is defined in ./kernel/include/uapi/asm-generic/errno-base.h.Be sure to execute set_fs(KERNEL_DS) before requesting memory, and execute set_fs(old_fs) after releasing the requested memory.
4. vfs_write() function
The function prototype is as follows:
ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
These parameters and return values are the same as vfs_read(), so I won’t repeat them.
Example of reading ordinary file system in kernel mode:
The following post a demo code that is compiled into the ko form of the kernel driver to read files in the file system.
It should be emphasized that this code is in the form of ko, after the Android system development board runs stably, insmod runs in the system. The author has not tried to package this driver directly into the kernel image to see if it can run normally in the mode that it runs when the system starts up.
#include #include #include #include #include string.h> #include #include static struct file *fp; static struct file *wfp; static int __init init() < printk("%s()\n", __FUNCTION__); #define FN "/sdcard/wanna" /* filp_open() is an asynchronous execution function. It will open the specified file asynchronously. If the function is ended without doing other things after opening the file, it is very likely that the following printing will not be displayed. */ fp = filp_open("/sdcard/wanna", O_RDONLY, 0); //The file mode of parameter 3 has little effect on reading files. printk("fs file address:0x%p\n", fp); msleep(100); if(IS_ERR(fp))//Use this IS_ERR() to check whether the pointer is valid, but you cannot directly determine whether the pointer is NULL. < printk("cannot open fs.\n"); goto FS_END; > char *out_file_name; const int of_len = strlen(FN) + 5; out_file_name = kmalloc(of_len, GFP_KERNEL); if(out_file_name == NULL) < printk("cannot malloc.\n"); goto FS_END; > memset(out_file_name, 0, of_len); snprintf(out_file_name, of_len, "%s%s", FN, "_out"); printk("out_file_name:%s\n", out_file_name); wfp = filp_open(out_file_name, O_WRONLY|O_CREAT, 0666); msleep(100); if(IS_ERR(wfp)) < printk("cannot open the write file.\n"); wfp = NULL; > mm_segment_t old_fs; old_fs = get_fs(); set_fs(KERNEL_DS); int size = 0; char rbuf[6]; loff_t pos = 0; loff_t wpos = 0; while(1) < memset(rbuf, 0, 6); /* Parameter 2 requires the memory address of the __user space, If you want to use the array created in the kernel directly, The file system state should be switched to KERNEL_DS state before use. That is, the above set_fs(KERNEL_DS) call. Parameter 3 is the position of the reading pointer. For ordinary files, it must pass in a valid loff_t pointer to realize the "breakpoint reading" function. */ size = vfs_read(fp, rbuf, 3, &pos); printk("read ret:%d, pos:%ld\n", size, pos); if(size < 1) < printk("read end.\n"); break; > printk("\t%s\n", rbuf); if(wfp) //Copy the content of the file that was previously read to another file. size = vfs_write(wfp, rbuf, size, &wpos); printk("write ret:%d, pos:%ld\n", size, wpos); > > set_fs(old_fs); msleep(50); FS_END: return 0; > static void __exit exit() if(!IS_ERR(fp)) < printk("closing fs file.\n"); int ret = filp_close(fp, NULL); printk("close ret:%d\n", ret); > if(wfp && !IS_ERR(wfp)) < printk("closing wfp.\n"); int ret = filp_close(wfp, 0); printk("close wfp ret:%d\n", ret); > msleep(100); > module_init(init); module_exit(exit); MODULE_LICENSE("GPL");
By the way, post the Makefile:
obj-m += mymodule.o KDIR := /home/chorm/workspace/my_android_src/kernel PWD ?= $(shell pwd) all: make -C $(KDIR) M=$(PWD) modules clean:
File management in the Linux kernel¶
This document describes how locking for files (struct file) and file descriptor table (struct files) works.
Up until 2.6.12, the file descriptor table has been protected with a lock (files->file_lock) and reference count (files->count). ->file_lock protected accesses to all the file related fields of the table. ->count was used for sharing the file descriptor table between tasks cloned with CLONE_FILES flag. Typically this would be the case for posix threads. As with the common refcounting model in the kernel, the last task doing a put_files_struct() frees the file descriptor (fd) table. The files (struct file) themselves are protected using reference count (->f_count).
In the new lock-free model of file descriptor management, the reference counting is similar, but the locking is based on RCU. The file descriptor table contains multiple elements — the fd sets (open_fds and close_on_exec, the array of file pointers, the sizes of the sets and the array etc.). In order for the updates to appear atomic to a lock-free reader, all the elements of the file descriptor table are in a separate structure — struct fdtable. files_struct contains a pointer to struct fdtable through which the actual fd table is accessed. Initially the fdtable is embedded in files_struct itself. On a subsequent expansion of fdtable, a new fdtable structure is allocated and files->fdtab points to the new structure. The fdtable structure is freed with RCU and lock-free readers either see the old fdtable or the new fdtable making the update appear atomic. Here are the locking rules for the fdtable structure —
- All references to the fdtable must be done through the files_fdtable() macro:
struct fdtable *fdt; rcu_read_lock(); fdt = files_fdtable(files); . if (n max_fds) . . rcu_read_unlock();
struct file *file; rcu_read_lock(); file = lookup_fd_rcu(fd); if (file) < . >. rcu_read_unlock();
rcu_read_lock(); file = files_lookup_fd_rcu(files, fd); if (file) < if (atomic_long_inc_not_zero(&file->f_count)) *fput_needed = 1; else /* Didn't get the reference, someone's freed */ file = NULL; > rcu_read_unlock(); . return file;
spin_lock(&files->file_lock); fd = locate_fd(files, file, start); if (fd >= 0) < /* locate_fd() may have expanded fdtable, load the ptr */ fdt = files_fdtable(files); __set_open_fd(fd, fdt); __clear_close_on_exec(fd, fdt); spin_unlock(&files->file_lock); .