Linux File System, Process and Open File Table [closed]
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
I’m a little bit confused about process and open file tables. I know that if 2 processes try to open the same file, there will be 2 entries in the open file table. I am trying to find out the reason for this. Why there are 2 entries created in the open file table when 2 different processes try to reach the same file? Why it can’t be done with 1 entry?
1 Answer 1
I’m not quite clear what you mean by «file tables». There are no common structures in the Linux kernel referred to as «file tables».
There is /etc/fstab , which stands for «filesystem table», which lists filesystems which are automatically mounted when the system is booted.
The «filetable» Stack Overflow tag that you included in this question is for SQL Server and not directly connected with Linux.
What it sounds like you are referring to when you talk about open files is links. See Hard and soft link mechanism. When a file is open in Linux, the kernel maintains what is basically another hard link to the file. That is why you can actually delete a file that is open and the system will continue running normally. Only when the application closes the file will the space on the disk actually be marked as free.
So for each inode on a filesystem (an inode is generally what we think of as a file), there are often multiple links—one for each entry in a directory, and one for each time an application opens the file.
Update: Here is a quote from the web page that inspired this question:
Each file table entry contains information about the current file. Foremost, is the status of the file, such as the file read or write status and other status information. Additionally, the file table entry maintains an offset which describes how many bytes have been read from (or written to) the file indicating where to read/write from next.
So, to directly answer the question, «Why there are 2 entries created in the open file table when 2 different processes try to reach the same file?», 2 entries are required because they may contain different information. One process may open the file read-only while the other read-write. And the file offset (position within the file) for each process will almost certainly be different.
Is the file table in the filesystem or in memory?
In the context of operating system control tables, does the term «file tables» refer to a data structure that is part of the filesystem, or that is in main memory (and in which case I assume it would only have references to open files)? My textbook 1 says,
The tables provide information about the existence of files, their location on secondary memory, their current status, and other attributes. Much, if not all, of this information may be maintained and used by a file management system, in which case the OS has little or no knowledge of files.
Also, what is a file management system? Does that mean the filesystem? 1 Stallings, Operating Systems, 7 th ed., p. 127
1 Answer 1
It is unclear without further context to determine if Stalling is talking about the in-memory inode table or the tables with in the filesystem. I lent a much earlier edition of book to someone, but never got it back; so I can’t look up the context myself.
There are three «file tables», but the one being discussed here is more commonly called the «in-memory inode table«; the second is commonly called the «open file table«, and exists per process. Both tables are in kernel memory and not accessible to a program. The third «table» is really two sets of tables within the filesystem (on disk), the first is the on-disk inode table and the second are the data blocks themselves (note: this discussion concerns traditional UNIX filesystem management, newer systems can have different organizations). Entries in the inode table have sequences of references into data blocks that contain either indirect reference blocks or actual data. The key to a file on the filesystem is the inode, not the data blocks themselves. When Stalling is talking about an on-disk «file table,» it will generally be the «smaller» table on disk that denotes files, such as the inode table or the block definition table in FAT systems.
In terms of the in-memory inode table, the inode is loaded from the file system, its st_nlink value is incremented and then made accessible to the rest of the system, when the inode data is written to disk, the st_ctime is updated. If the inode is no longer needed in memory, the st_nlink value is decremented and the entry in the table is marked as free. Every process will start with references to about three or five entries into the in-memory inode table: the inodes of stdin , stdout , stderr — these are often a device file (tty) — and then references to the current directory and the root directory. An inode will only reside in the table once, so there may be multiple references to a single inode in the table.
The open file table is kept per process and contains references into the in-memory inode table as well as pointers to buffers, and state information (like fseek(2) value and flags from open(2) . The file descriptor is literally an index into the open file table; but most people refer to the entry in the open file table when talking about the «file descriptor».
When a file is opened using open(2) , an available entry in the open file table is found, the inode of the file reference by the pathname is determined, that inode is loaded into the in-memory inode table, if not already loaded, the st_nlink count is increased and the inode entry is referenced in the file descriptor, flags are set and buffers are allocated. When closed, the reverse occurs.
The routines within the kernel are called the «file management system» and the «filesystem» is the organization on disk. These days there are a number of ‘plugable’ modules that can be loaded ( modprobe(8) ) into the file management system for different organizations on disk. For example, there are ext2/ext3/ext4 filesystem types, and each of them have a different module in the kernel’s file management system; the same with ntfs, sbfs, nfs, vfat, jfs, etc.
This is a bit more long-winded than I originally intended, so I’ll stop here.
Evolution of File Descriptor Table in Linux Kernel
See https://en.wikipedia.org/wiki/File_descriptor for per-process file descriptor table, system-wide file table and inode table.
- Thie simplest case is one process opens one disk file (e.g. open(«/var/log/access.log», O_WRONLY) ), it uses one entry from each table.
- Then the process does a dup() , the old and new file descriptors may be used interchangeably. Two file descriptors refer to the same open file, and thus share file offset and file status flags. However, the two file descriptors do not share file descriptor flags ( close_on_exec ).
- Later, the process does a open(«/var/log/access.log», O_RDONLY) , a new file descriptor and new file description is created, and pointing to the same inode entry. Therefore, it sees the same file size as the first two fds, but different offset.
Sizes of those three tables in early Linux
Note 1: file_table and inode_table were made dynamic in 0.99.10.
[PATCH] Linux-0.99.10 (June 7, 1993) The "struct file" file_table is made dynamic, instaed of a static allocation. For the first time you can have _lots_ of files open. diff --git a/fs/file_table.c b/fs/file_table.c --- a/fs/file_table.c +++ b/fs/file_table.c -struct file file_table[NR_FILE]; +struct file * first_file; +int nr_files = 0;
Note 2: ext2 file system was added 0.99.7.
1.1.11 to 1.3.21
Split into struct files_struct . 1.1.11 was released in 1995/05.
// include/linux/sched.h of linux-1.3.21 struct files_struct < int count; fd_set close_on_exec; struct file * fd[NR_OPEN]; >; struct task_struct < // . /* filesystem information */ struct fs_struct fs[1]; /* open file information */ struct files_struct files[1]; /* memory management info */ struct mm_struct mm[1]; // . >;
1.3.22 to 2.1.89
Change files from a struct[1] to a pointer, so it can be shared by threads within a process. 1.3.22 was released in 1995/09. LinuxThreads needs 2.0 kernel, which was released in 1996/07.
// include/linux/sched.h of linux-2.0.2 /* Open file table structure */ struct files_struct < int count; fd_set close_on_exec; fd_set open_fds; struct file * fd[NR_OPEN]; >; struct task_struct < // . /* filesystem information */ - struct fs_struct fs[1]; + struct fs_struct *fs; /* open file information */ - struct files_struct files[1]; + struct files_struct *files; /* memory management info */ - struct mm_struct mm[1]; + struct mm_struct *mm; // . >;
2.1.90 to 2.6.13
Change fixed-length array fd to dynamic array. 2.2.0 was released in 1999/01.
// include/linux/sched.h of linux-2.2.0 /* * Open file table structure */ struct files_struct < atomic_t count; + int max_fds; + struct file ** fd; /* current fd array */ fd_set close_on_exec; // changed to fd_set* in 2.2.12 fd_set open_fds; - struct file * fd[NR_OPEN]; >; struct task_struct < // . /* open file information */ struct files_struct *files; // . >;
2.6.14 to now (4.15.7)
Introduce struct fdtable for RCU. 2.6.15 was released in 2006/01, Ubuntu 6.04 LTS and Debian 4 ship it.
// include/linux/fdtable.h of linux-2.6.37 struct fdtable < unsigned int max_fds; struct file __rcu **fd; /* current fd array */ fd_set *close_on_exec; fd_set *open_fds; struct rcu_head rcu; struct fdtable *next; >; /* * Open file table structure */ struct files_struct < /* * read mostly part */ atomic_t count; struct fdtable __rcu *fdt; struct fdtable fdtab; /* * written part on a separate cache line in SMP */ spinlock_t file_lock ____cacheline_aligned_in_smp; int next_fd; struct embedded_fd_set close_on_exec_init; struct embedded_fd_set open_fds_init; struct file __rcu * fd_array[NR_OPEN_DEFAULT]; >; struct task_struct < // . /* open file information */ struct files_struct *files; // . >;
// include/linux/fs.h of linux-4.9 struct file < union < struct llist_node fu_llist; struct rcu_head fu_rcuhead; >f_u; struct path f_path; struct inode *f_inode; /* cached value */ // added back in 3.9, same as f_path.dentry->d_inode const struct file_operations *f_op; /* * Protects f_ep_links, f_flags. * Must not be taken from IRQ context. */ spinlock_t f_lock; atomic_long_t f_count; unsigned int f_flags; fmode_t f_mode; struct mutex f_pos_lock; // Fixed in 3.14 loff_t f_pos; struct fown_struct f_owner; const struct cred *f_cred; struct file_ra_state f_ra; u64 f_version; #ifdef CONFIG_SECURITY void *f_security; #endif /* needed for tty driver, and maybe others */ void *private_data; #ifdef CONFIG_EPOLL /* Used by fs/eventpoll.c to link all the hooks to this file */ struct list_head f_ep_links; struct list_head f_tfile_llink; #endif /* #ifdef CONFIG_EPOLL */ struct address_space *f_mapping; > __attribute__((aligned(4))); /* lest something weird decides that 2 is OK */
FreeBSD up to 9.3
4.3BSD-Reno and older BSDes use fixed-length array of struct file* .
From BSD Net/2 up to FreeBSD 9.3 use a similiar dynamic array data structure of Linux 2.0 (see diagram above.), where proc == task_struct , filedesc == files_struct , file == file .
// sys/proc.h /* * Process structure. */ struct proc < // . struct filedesc *p_fd; /* (b) Open files. */ // . >; // sys/filedesc.h struct filedesc < struct file **fd_ofiles; /* file structures for open files */ char *fd_ofileflags; /* per-process open file flags */ struct vnode *fd_cdir; /* current directory */ struct vnode *fd_rdir; /* root directory */ struct vnode *fd_jdir; /* jail root directory */ int fd_nfiles; /* number of open files allocated */ NDSLOTTYPE *fd_map; /* bitmap of free fds */ int fd_lastfile; /* high-water mark of fd_ofiles */ int fd_freefile; /* approx. next free file */ u_short fd_cmask; /* mask for file creation */ u_short fd_refcnt; /* thread reference count */ u_short fd_holdcnt; /* hold count on structure + mutex */ struct sx fd_sx; /* protects members of this struct */ struct kqlist fd_kqlist; /* list of kqueues on this filedesc */ int fd_holdleaderscount; /* block fdfree() for shared close() */ int fd_holdleaderswakeup; /* fdfree() needs wakeup */ >; // sys/file.h struct file < void *f_data; /* file descriptor specific data */ struct fileops *f_ops; /* File operations */ struct ucred *f_cred; /* associated credentials. */ struct vnode *f_vnode; /* NULL or applicable vnode */ short f_type; /* descriptor type */ short f_vnread_flags; /* (f) Sleep lock for f_offset */ volatile u_int f_flag; /* see fcntl.h */ volatile u_int f_count; /* reference count */ // . off_t f_offset; // . >;
Unix system file tables
Your question is about open files and processes; the other question is about open files and fork . Processes are created by the fork system call. Thus, they are essentially the same question.
2 Answers 2
There are three «system file tables»: There is a file descriptor table that maps file descriptors (small integers) to entries in the open file table. Each entry in the open file table contains (among other things) a file offset and a pointer to the in-memory inode table. Here’s a picture: (source: rich from www.cs.ucsb.edu now on archive.org)
So there is neither just one file table entry for an open file nor is there just one per process . there is one per open() call, and it is shared if the file descriptor is dup() ed or fork() ed.
- When two or more processes open a file for reading, there’s an entry in the open file table per open. There is even an entry per open if one process opens the file multiple times.
- A single entry is not created in the open file table for different processes opening the same file (but there is just one entry in the in-memory inode table).
- If file1.txt is opened twice, in the same or two different processes, there are two different open file table entries (but just one entry in the in-memory inode table).