Linux software RAID 1 — root filesystem becomes read-only after a fault on one disk
We’ve had problem on several systems where a fault on one drive has locked the root filesystem as readonly, which obviously causes problems.
[root@myserver /]# mount | grep Root /dev/mapper/VolGroup00-LogVolRoot on / type ext3 (rw) [root@myserver /]# touch /foo touch: cannot touch `/foo': Read-only file system
I can see that one of the partitions in the array is faulted:
[root@myserver /]# mdadm --detail /dev/md1 /dev/md1: [. ] State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 [. ] Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 2 8 2 - faulty spare /dev/sda2
[root@myserver /]# mount -n -o remount / mount: block device /dev/VolGroup00/LogVolRoot is write-protected, mounting read-only
The LVM tools give an error unless —ignorelockingfailure is used (because they can’t write to /var) but show the volume group as rw:
[root@myserver /]# lvm vgdisplay Locking type 1 initialisation failed. [root@myserver /]# lvm pvdisplay --ignorelockingfailure --- Physical volume --- PV Name /dev/md1 VG Name VolGroup00 PV Size 279.36 GB / not usable 15.56 MB Allocatable yes (but full) [. ] [root@myserver /]# lvm vgdisplay --ignorelockingfailure --- Volume group --- VG Name VolGroup00 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 4 VG Access read/write VG Status resizable [. ] [root@myserver /]# lvm lvdisplay /dev/VolGroup00/LogVolRoot --ignorelockingfailure --- Logical volume --- LV Name /dev/VolGroup00/LogVolRoot VG Name VolGroup00 LV UUID PGoY0f-rXqj-xH4v-WMbw-jy6I-nE04-yZD3Gx LV Write Access read/write [. ]
In this case /boot (seperate RAID meta-device) and /data (a different logical volume in the same volume group) are still writtable. From the previous occurances I know that a restart will bring the system back up with a read/write root filesystem and a properly degraded RAID array.
1) When this occurs, how can I get the root filesystem back to read/write without a system restart?
2) What needs to be changed to stop this filesystem locking? With a RAID 1 failure on a single disk we don’t want the filesystems to lockup, we want the system to keep running until we can replace the bad disk.
Edit: I can see this in teh dmesg output — doe sthis indicate a failure of /dev/sda, then a seperate failure on /dev/sdb that lead to the filesystem being set to read only?
sda: Current [descriptor]: sense key: Aborted Command Add. Sense: Recorded entity not found Descriptor sense data with sense descriptors (in hex): 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 03 ce 85 end_request: I/O error, dev sda, sector 249477 raid1: Disk failure on sda2, disabling device. Operation continuing on 1 devices ata1: EH complete SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:1, o:0, dev:sda2 disk 1, wo:0, o:1, dev:sdb2 RAID1 conf printout: --- wd:1 rd:2 disk 1, wo:0, o:1, dev:sdb2 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: irq_stat 0x40000001 ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 51/04:00:34:cf:f3/00:00:00:f3:40/a3 Emask 0x1 (device error) ata2.00: status: < DRDY ERR >ata2.00: error: < ABRT >ata2.00: configured for UDMA/133 ata2: EH complete sdb: Current [descriptor]: sense key: Aborted Command Add. Sense: Recorded entity not found Descriptor sense data with sense descriptors (in hex): 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 01 e3 5e 2d end_request: I/O error, dev sdb, sector 31677997 Buffer I/O error on device dm-0, logical block 3933596 lost page write due to I/O error on dm-0 ata2: EH complete SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 ata2.00: irq_stat 0x40000008 ata2.00: cmd 61/38:00:f5:d6:03/00:00:00:00:00/40 tag 0 ncq 28672 out res 41/10:00:f5:d6:03/00:00:00:00:00/40 Emask 0x481 (invalid argument) ata2.00: status: < DRDY ERR >ata2.00: error: < IDNF >ata2.00: configured for UDMA/133 sd 1:0:0:0: SCSI error: return code = 0x08000002 sdb: Current [descriptor]: sense key: Aborted Command Add. Sense: Recorded entity not found Descriptor sense data with sense descriptors (in hex): 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 03 d6 f5 end_request: I/O error, dev sdb, sector 251637 ata2: EH complete SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back Aborting journal on device dm-0. journal commit I/O error ext3_abort called. EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only
Arch Linux
First post: For the last three days I’ve been going through the arch wiki, learning new things and installing arch for the first time.
Almost done with my installation, I tried installing intel-ucode, since I have an intel CPU.
error: failed to init transaction (unable to lock database) error: could not lock database: Read-only filesystem
I quickly realised that my root partition file system had become read-only.
# mkdir dir_name mkdir: cannot create directory dir_name: Read-only file system
Which is weird, since I’ve been running around, creating and viming files left and right for the last few hours.
What is more lsblk and fstab alike will assure me my root partition is not read only:
#lsblk name maj:min rm size ro type mountpoint nvme0n1p1 259:1 0 220G 0 part / nvme0n1p3 259:3 0 600M 0 part /boot
The second is my efi partition, which I mounted to /boot (it was previously on /efi)
#cat /etc/fstab UUID=uuid_here / ext4 rw,relatime 0 1 UUID=uuid_here /efi vfat rw,relatime,fmask=0022,dmas=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 2
Note that fstab has not yet been updated and still thinks the efi partition is mounted on /efi. Of course,I can’t update it if I can’t write on the filesystem, but I don’t think that has anything to do with the problem.
What do you make of this? What could have caused this?
P.S. Commands were executed from arch-chroot /mnt.
I’m still booting from my installation USB. For that reason I have been hesitant to reboot before fully completing the installation.
That said, the USB’s environment is not read-only, so there’s that.
Last edited by silberblume (2020-12-27 23:02:43)
There’s nothing scarier, nothing more deadly than an ambulance
Why does initramfs mount the root filesystem read-only
What is the reason for the root filesystem being mounted ro in the initramfs (and in initrd). For example the Gentoo initramfs guide mounts the root filesystem with:
mount -o ro /dev/sda1 /mnt/root
mount -o rw /dev/sda1 /mnt/root
I can see that there is a probably a good reason (and it probably involves switchroot ), however it does not seem to be documented anywhere.
3 Answers 3
The initial ramdisk (initrd) is typically a stripped-down version of the root filesystem containing only that which is needed to mount the actual root filesystem and hand off booting to it.
The initrd exists because in modern systems, the boot loader can’t be made smart enough to find the root filesystem reliably. There are just too many possibilities for such a small program as the boot loader to cover. Consider NFS root, nonstandard RAID cards, etc. The boot loader has to do its work using only the BIOS plus whatever code can be crammed into the boot sector.
The initrd gets stored somewhere the boot loader can find, and it’s small enough that the extra bit of space it takes doesn’t usually bother anyone. (In small embedded systems, there usually is no «real» root, just the initrd.)
The initrd is precious: its contents have to be preserved under all conditions, because if the initrd breaks, the system cannot boot. One design choice its designers made to ensure this is to make the boot loader load the initrd read-only. There are other principles that work toward this, too, such as that in the case of small systems where there is no «real» root, you still mount separate /tmp , /var/cache and such for storing things. Changing the initrd is done only rarely, and then should be done very carefully.
Getting back to the normal case where there is a real root filesystem, it is initially mounted read-only because initrd was. It is then kept read-only as long as possible for much the same reasons. Any writing to the real root that does need to be done is put off until the system is booted up, by preference, or at least until late in the boot process when that preference cannot be satisfied.
The most important thing that happens during this read-only phase is that the root filesystem is checked to see if it was unmounted cleanly. That is something the boot loader could certainly do instead of leaving it to the initrd, but what then happens if the root filesystem wasn’t unmounted cleanly? Then it has to call fsck to check and possibly fix it. So, where would initrd get fsck , if it was responsible for this step instead of waiting until the handoff to the «real» root? You could say that you need to copy fsck into the initrd when building it, but now it’s bigger. And on top of that, which fsck will you copy? Linux systems regularly use a dozen or so different filesystems. Do you copy only the one needed for the real root at the time the initrd is created? Do you balloon the size of initrd by copying all available fsck.foo programs into it, in case the root filesystem later gets migrated to some other filesystem type, and someone forgets to rebuild the initrd?
The Linux boot system architects wisely chose not to burden the initrd with these problems. They delegated checking of the real root filesystem to the real root filesystem, since it is in a better position to do that than the initrd.
Once the boot process has proceeded far enough that it is safe to do so, the initrd gets swapped out from under the real root with pivot_root(8) , and the filesystem is remounted in read-write mode.
root file system stays read-only
The second try also did not make the root fs any more writable than it was at the beginning sadly. Is there something I am missing here (in the chain of mounting volumes after boot)? I would be very grateful for any kind of help
1 Answer 1
Sometimes Linux mounts hard drives in a funny way. Just for reference: I have a Debian based OS which randomly mounts /dev/sda/ as /dev/sdb/ and vice versa. I haven’t experienced any of those problems and my fstab file includes the ro (read only) parameter.
I believe that the following part in fstab mounts the filesystem in ro mode if and only if errors have been encountered while attempting to boot in normal mode: «/dev/mapper/sda3_crypt / ext4 errors=remount-ro 0 1» (I have LUKS encrypted partitions, so instead of /dev/sda i have /dev/mapper/sda3_crypt)
That was put in place to prevent potential damage to the filesystem if errors were detected while trying to mount it in normal mode.
Edit: You shouldn’t alter your fstab in that way which removes protection.
Do you have mechanical Hard Disk Drive? If yes, try to run fsck to check and fix your filesystem: fsck -fy /dev/sd(X)(Y) where X, Y are your hard disk and boot partition with which you have troubles.
To find your drive(s), do fdisk -l | more to list all your Hard Disk Drives and partitions one screen at a time.
I hope this works for you well.