Linux root file system mount

Содержание

How does a kernel mount the root partition?
6 Answers 6
Mounting the Root Filesystem
Recommended Books

How does a kernel mount the root partition?

My question is with regards to booting a Linux system from a separate /boot partition. If most configuration files are located on a separate / partition, how does the kernel correctly mount it at boot time? Any elaboration on this would be great. I feel as though I am missing something basic. I am mostly concerned with the process and order of operations. Thanks! EDIT: I think what I needed to ask was more along the lines of the dev file that is used in the root kernel parameter. For instance, say I give my root param as root=/dev/sda2. How does the kernel have a mapping of the /dev/sda2 file?

Though people below cover initrd, there is little discussion about why initrd is used. My impression is that is because distributions like Debian want to use one kernel on many different machines of the same architecture, but possibly widely different hardware. This is made possible by modularizing hardware support via kernel modules. The initrd does not require much hardware support to boot, and once it does, it loads the necessary hardware modules to proceed. Elaborations/corrections to this are appreciated.

6 Answers 6

In ancient times, the kernel was hard coded to know the device major/minor number of the root fs and mounted that device after initializing all device drivers, which were built into the kernel. The rdev utility could be used to modify the root device number in the kernel image without having to recompile it.

Eventually boot loaders came along and could pass a command line to the kernel. If the root= argument was passed, that told the kernel where the root fs was instead of the built in value. The drivers needed to access that still had to be built into the kernel. While the argument looks like a normal device node in the /dev directory, there obviously is no /dev directory before the root fs is mounted, so the kernel can not look up a dev node there. Instead, certain well known device names are hard coded into the kernel so the string can be translated to the device number. Because of this, the kernel can recognize things like /dev/sda1 , but not more exotic things like /dev/mapper/vg0-root or a volume UUID.

Later, the initrd came into the picture. Along with the kernel, the boot loader would load the initrd image, which was some kind of compressed filesystem image (gzipped ext2 image, gzipped romfs image, squashfs finally became dominant). The kernel would decompress this image into a ramdisk and mount the ramdisk as the root fs. This image contained some additional drivers and boot scripts instead of a real init . These boot scripts performed various tasks to recognize hardware, activate things like raid arrays and LVM, detect UUIDs, and parse the kernel command line to find the real root, which could now be specified by UUID, volume label and other advanced things. It then mounted the real root fs in /initrd , then executed the pivot_root system call to have the kernel swap / and /initrd , then exec /sbin/init on the real root, which would then unmount /initrd and free the ramdisk.

Finally, today we have the initramfs . This is similar to the initrd , but instead of being a compressed filesystem image that is loaded into a ramdisk, it is a compressed cpio archive. A tmpfs is mounted as the root, and the archive is extracted there. Instead of using pivot_root , which was regarded as a dirty hack, the initramfs boot scripts mount the real root in /root , delete all files in the tmpfs root, then chroot into /root , and exec /sbin/init .

Источник

Mounting the Root Filesystem

Mounting the root filesystem is a crucial part of system initialization. It is a fairly complex procedure because the Linux kernel allows the root filesystem to be stored in many different places, such as a hard disk partition, a floppy disk, a remote filesystem shared via NFS, or even a fictitious block device kept in RAM.

To keep the description simple, let’s assume that the root filesystem is stored in a partition of a hard disk (the most common case, after all). While the system boots, the kernel finds the major number of the disk that contains the root filesystem in the root_dev variable.

The root filesystem can be specified as a device file in the /dev directory either when compiling the kernel or by passing a suitable «root» option to the initial bootstrap loader. Similarly, the mount flags of the root filesystem are stored in the root mountflags variable. The user specifies these flags either by using the rdev external program on a compiled kernel image or by passing a suitable rootflags option to the initial bootstrap loader (see Appendix A).

Mounting the root filesystem is a two-stage procedure, shown in the following list.

1. The kernel mounts the special rootfs filesystem, which just provides an empty directory that serves as initial mount point.

2. The kernel mounts the real root filesystem over the empty directory.

Why does the kernel bother to mount the rootfs filesystem before the real one? Well, the rootfs filesystem allows the kernel to easily change the real root filesystem. In fact, in some cases, the kernel mounts and unmounts several root filesystems, one after the other. For instance, the initial bootstrap floppy disk of a distribution might load in RAM a kernel with a minimal set of drivers, which mounts as root a minimal filesystem stored in a RAM disk. Next, the programs in this initial root filesystem probe the hardware of the system (for instance, they determine whether the hard disk is EIDE, SCSI, or whatever), load all needed kernel modules, and remount the root filesystem from a physical block device.

The first stage is performed by the init_mount_tree( ) function, which is executed during system initialization:

struct file_system_type root_fs_type; root fs type.name = «rootfs»;

root_fs_type.read_super = rootfs_read_super; root_fs_type.fs_flags = FS_NOMOUNT; register_filesystem(&root_fs_type);

root vfsmnt = do kern mount(«rootfs», 0, «rootfs», NULL);

The root_fs_type variable stores the descriptor object of the rootfs special filesystem; its fields are initialized, and then it is passed to the register_filesystem( ) function (see the earlier section Section 12.3.2). The do_kern_mount( ) function mounts the special filesystem and returns the address of a new mounted filesystem object; this address is saved by init_mount_tree( ) in the root_vfsmnt variable. From now on, root_vfsmnt represents the root of the tree of the mounted filesystems.

The do_kern_mount( ) function receives the following parameters:

The type of filesystem to be mounted flags

The mount flags (see Table 12-13 in the later section Section 12.4.2)

The device file name of the block device storing the filesystem (or the filesystem type name for special filesystems)

Pointers to additional data to be passed to the read_super method of the filesystem

The function takes care of the actual mount operation by performing the following operations:

1. Checks whether the current process has the privileges for the mount operation (the check always succeeds when the function is invoked by init_mount_tree( ) because the system initialization is carried on by a process owned by root).

2. Invokes get_fs_type( ) to search in the list of filesystem types and locate the name stored in the type parameter; get_fs_type( ) returns the address of the corresponding file_system_type descriptor.

3. Invokes alloc_vfsmnt( ) to allocate a new mounted filesystem descriptor and stores its address in the mnt local variable.

4. Initializes the mnt->mnt_devname field with the content of the name parameter.

5. Allocates a new superblock and initializes it. do_kern_mount( ) checks the flags in the file_system_type descriptor to determine how to do this:

a. If fs_requires_dev is on, invokes get_sb_bdev( ) (see the later section Section 12.4.2)

b. If fs_single is on, invokes get_sb_single( ) (see the later section Section 12.4.2)

c. Otherwise, invokes get_sb_nodev( )

6. If the fs_nomount flag in the file_system_type descriptor is on, sets the ms_nouser flag in the superblock object.

7. Initializes the mnt->mnt_sb field with the address of the new superblock object.

8. Initializes the mnt->mnt_root and mnt->mnt_mountpoint fields with the address of the dentry object corresponding to the root directory of the filesystem.

9. Initializes the mnt->mnt_parent field with the value in mnt (the newly mounted filesystem has no parent).

10. Releases the s_umount semaphore of the superblock object (it was acquired when the object was allocated in Step 5).

11. Returns the address mnt of the mounted filesystem object.

When the do_kern_mount( ) function is invoked by init_mount_tree( ) to mount the rootfs special filesystem, neither the fs_requires_dev flag nor the fs_single flag are set, so the function uses get_sb_nodev( ) to allocate the superblock object. This function executes the following steps:

1. Invokes get_unnamed_dev( ) to allocate a new fictitious block device identifier (see the earlier section Section 12.3.1).

2. Invokes the read_super( ) function, passing to it the filesystem type object, the mount flags, and the fictitious block device identifier. In turn, this function performs the following actions:

a. Allocates a new superblock object and puts its address in the local variable s.

b. Initializes the s->s_dev field with the block device identifier.

c. Initializes the s->s_flags field with the mount flags (see Table 12-13).

d. Initializes the s->s_type field with the filesystem type descriptor of the filesystem.

f. Inserts the superblock in the global circular list whose head is super blocks.

g. Inserts the superblock in the filesystem type list whose head is s->s_type->fs_supers.

i. Acquires for writing the s->s_umount read/write semaphore. j. Acquires the s->s_lock semaphore.

k. Invokes the read_super method of the filesystem type.

l. Sets the ms_active flag in s->s_flags.

m. Releases the s->s_lock semaphore.

n. Returns the address s of the superblock.

3. If the filesystem type is implemented by a kernel module, increments its usage counter.

4. Returns the address of the new superblock.

The second stage of the mount operation for the root filesystem is performed by the mount_root( ) function near the end of the system initialization. For the sake of brevity, we consider the case of a disk-based filesystem whose device files are handled in the traditional way (we briefly discuss in Chapter 13 how the devfs virtual filesystem offers an alternative way to handle device files). In this case, the function performs the following operations:

1. Allocates a buffer and fills it with a list of filesystem type names. This list is either passed to the kernel in the rootfstype boot parameter or is built by scanning the elements in the simply linked list of filesystem types.

2. Invokes the bdget( ) and blkdev_get( ) functions to check whether the ROOT_dev root device exists and is properly working.

3. Invokes get_super( ) to search for a superblock object associated with the root dev device in the super blocks list. Usually none is found because the root filesystem is still to be mounted. The check is made, however, because it is possible to remount a previously mounted filesystem. Usually the root filesystem is mounted twice during the system boot: the first time as a read-only filesystem so that its integrity can be safely checked; the second time for reading and writing so that normal operations can start. We’ll suppose that no superblock object associated with the root dev device is found in the super blocks list.

4. Scans the list of filesystem type names built in Step 1. For each name, invokes get_fs_type( ) to get the corresponding file_system_type object, and invokes read_super( ) to attempt to read the corresponding superblock from disk. As described earlier, this function allocates a new superblock object and attempts to fill it by using the method to which the read_super field of the file_system_type object points. Since each filesystem-specific method uses unique magic numbers, all read_super( ) invocations will fail except the one that attempts to fill the superblock by using the method of the filesystem really used on the root device. The read_super( ) method also creates an inode object and a dentry object for the root directory; the dentry object maps to the inode object.

5. Allocates a new mounted filesystem object and initializes its fields with the root_dev block device name, the address of the superblock object, and the address of the dentry object of the root directory.

6. Invokes the graft_tree( ) function, which inserts the new mounted filesystem object in the children list of root_vfsmnt, in the global list of mounted filesystem objects, and in the mount_hashtable hash table.

7. Sets the root and pwd fields of the fs_struct table of current (the init process) to the dentry object of the root directory.

Recommended Books

Linux Kernel Reference
Linux Bootstrap Loader
Linux System Administration
Linux Networking
Linux Security
Linux Programming

Источник