Linux raid on ssd

Hybrid HDD + SSD RAID1

In general, hybrid RAID1 is RAID1 that mirrors data on two different storage technologies. Here we are talking about a HDD and an SSD. (Or more if you want more than 2-way RAID1. Why would you want, e.g. 3-way RAID1? Simple: If one disk fails, you still have redundancy, same reason as for using RAID6. And that one disk will fail sooner or later.)

Why do it?

HDDs and SSDs have different characteristics. With a hybrid slution, you get some of the advantages of both. Let me brifly state the main characteristics of HDDs and SSDs. (+) is something positive, (o) is neutral and (-) is a disadvantage.

  • (+) Proven technology, characteristics well understood
  • (+) Cheap
  • (+) Unlimited overwrites
  • (+) Small sectors
  • (o) Reasonable reliablity, good data endurance
  • (o) Reasonable linear access speed
  • (-) Sensitive to mechanical shock and vibration
  • (-) High access latency
  • (-) Unproven technology
  • (-) Expensive
  • (-) Limited overwrites (wear-leveling helps)
  • (-) Large sectors (100kB and larger)
  • (-) Unknown reliability, unknown data endurance (but lower than HDDs)
  • (+) Good linear speed
  • (+) Not sensitive to mechanical shock and vibration
  • (+) Very low access latency

Why do I classify reliability and especially data endurance as unknown? Simple: SSDs with MLC (Multi-Level Cells) and wear-leveling have not been long enough on the market to know. Do not trust vendor marketing material. If you look at what they actually promise, then you will notice that they give very little hard assurances. Especially data endurance will be low. The best professional FLASH media have 10 years, while magnetic media can reach 50 years. In a running system, you can increase endurance by regular complete reads, see the section about maintenance below.

Anyways, what you get if you combine an SSD and a HDD in a RAID1 is that you can get all the reads from the SSD with SSD speed and at the same time have the redundancy of RAID1 without the need to buy a second, expensive SSD. Writes will still be at HDD speed, but writes are typically a lot rarer than reads. In addition, writes can be buffered, while reads cannot.

The second thing you get is that you have two different storage technologies and chances are good that things that kill one will let the other intact. For example, heat is much more likely to kill a HDD. A massive amount of small writes is much more likely to kill an SSD.

How to do it with Linux Software RAID

The trick is to create the RAID1 array and set the HDD(s) during creation as «write-mostly». This will cause the kernel to only do (slow) reads from the HDD if they are really needed. All other reads will go to the SSD. This option was originally added when mirroring over a slow network interface, but performs equally well to concentrate reads on an SSD.

Читайте также:  Installing gcc compiler in linux

Here is how to do it. Let us assume you want to RAID1 a HDD partition sdb6 and an SSD partition sdc6 as md1. (Substitute full disk dev if needed. You can mix partitions and full disks.) The respective call to mdadm would be as follows:

mdadm --metadata=0.90 --create -n 2 -l 1 /dev/md1 /dev/sdc6 -W /dev/sdb6
mdadm --metadata=0.90 --create -n 3 -l 1 /dev/md1 /dev/sdc6 -W /dev/sdb6 -W /dev/sda6

Note that I specify the old 0.90 superblock format. The reason is that the «new» formats are broken as they do not offer kernel-level autodetection. RAID array assemply is the job of the RAID controller, and that is the kernel. Why anybody though it would be acceptable to require some userspace-script do it is beyond me. There are other problems with the «improved» formats, and unless you need them because the offer more disks per array, I recommend to stay away.

After you have created the array, a subsequent check in /proc/mdstat should show a «(W)» after the HDD components. Here is an example from my set-up, with sdb6 and sdc6 HDD partitions and sdd1 an SSD partition (that is a triple-RAID1):

cat /proc/mdstat Personalities : [linear] [raid1] [raid6] [raid5] [raid4] . md6 : active raid1 sdc6[0](W) sdb6[1](W) sdd1[2] 62508800 blocks [3/3] [UUU] .

Observed read speeds are the same as for an SSD alone. The write speeds are comparable to a HDD-only RAID1, but only after the filesystem runs out of memory to buffer the writes. Since writes can often been buffered, overall a hybrid array is a lot faster than one with only HDDs.

How to do it for an existing RAID1

echo writemostly > /sys/block/md6/md/dev-sdc6/state
echo -writemostly > /sys/block/md6/md/dev-sdc6/state

If for some reason you cannot set a component of a RAID1 to «write-mostly», you can kick it from the array and re-add it with the write-mostly flag active. This will temporarily lower your redundancy level. Backup before doing this is recomended.

To set /dev/sdc6 from the last example to «write-mostly» would work as follows:

To kick, first set it to «faulty»:

mdadm --fail /dev/md6 /dev/sdc6
mdadm --remove /dev/md6 /dev/sdc6
mdadm --add /dev/md6 --write-mostly /dev/sdc6

Maintenance

There are two aspects to storage maintenance with RAID: RAID maintenance and storage device maintenance.

Both have to goal to detect problems early when there is still a chance to correct them and to notify you in time when it looks like manual intervention is needed. Still, keep in mind that RAID is not backup. It only covers some of the areas a backup covers, but not all. For example, user error and malware problems are not covered by RAID. Your computer being hit by lightening is also not covered. You do need the backup in addition. What RAID gives you is that the probability of needing that backup is lower, hence the process for restoring from backup can take higher effort, which makes it cheaper. Or you just have the hassel far less often.

Читайте также:  Узнать активные порты linux

RAID consistency checks

I recommend running a RAID consistency check every 7 — 15 days. The way to run it is a bit obscure. Basically, you read «/sys/block/mdx/md/mismatch_cnt» (substitute your md device for «mdx») before to make sure it is zero. Then you put the string «check» into «/sys/block/mdx/md/sync_action» (replace «mdx» as before) and wait for it to not give «check» anymore. Then you read the mismatch count again and make sure it is zero.

Here is a Python script I wrote that does this md_check.py, just adjust the configured device at the start and use if from cron like this:

# check array with SDD 33 6 * * * /root/sys_tools/mdadm/md_check.py

This script can be used for other RAID arrays as well, not just for RAID1 or hybrid arrays. I run it two times a month from cron for each of my RAID arrays (I currently have 8), whith a maximum of one check per day.

Make sure cron can send email to you or you will not be notified in case of errors. That would make the check basically worthless.

It is also possible that your distribution already does this check automatically. Debian does so, but only once a month and with a pretty convoluted script that only works sometimes. It also seems to be missing any meaningful reporting, which makes the check worthless. From my experience, the only reporting that works are checks that send email to an address that is read regularly. For faster alerting, use a mailbox system that notifies you via text message or send the email to your mobile phone in the first place. Forget about anything else, it just it just does not work. Email is the base mechanism to use. And it has the added advantage that you can either send it directly or put a message to stdout when called from cron and cron will send the email for you. That is also one reason any real sysadmin makes reliable email sending a top priority.

SMART selftests

The second thing that should be done regularly is a full device read test. I do it every 14 days. For HDDs, you can run a long SMART selftest, e.g. with smartd or manually from cron as well. Make sure you have smartd configured and working to catch errors! smartd also needs to be able to send email to you, otherwise all monitoring is basically worthless.

Источник

img

Оставляем 12% диска на отображение свободных блоков чтобы SSD не сдох.

mdadm --create /dev/md2 --level=1 --chunk=128 --raid-devices=2 /dev/sdc1 /dev/sdd1 Добаляем в /etc/mdadm/mdadm.conf обновление из вывода mdadm --detail --scan

mkfs.ext4 -O extent -b 4096 -E stride = 128 ,stripe-width= 128 / dev / md2

/ dev / md2 / home ext4 discard,noatime, errors =remount-ro 0 1

:~# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdd1[1] sdc1[0] 52428216 blocks super 1.2 [2/2] [UU] md1 : active raid1 sdb3[0] sda3[1] 929012672 blocks [2/2] [UU] md0 : active raid1 sdb1[0] sda1[1] 499904 blocks [2/2] [UU] unused devices: <none>

Для оптимизации обращения к SSD ставим apt-get install sysfsutils и в /etc/sysfs.conf добавляем

block/sdc/queue/scheduler = noop block/sdd/queue/scheduler = noop

Что кажется странным:
— RAID1 на ssd зачем?
— 60Гб ssd какие? MLC?
— chunk для RAID1 — зачем? задел на будущее? (представил себе RAID10 из ssd )

Читайте также:  Linux get file without extension

Просто зеркалирование всего, ssd — mlc, размер chunk_size установлен таким образом в связи с использованием ssd, рекомендации из статей, которые приведены выше

в RAID1 износ будет равномерным, смысла нет в райде — сдохнут практически одновременно,
chunk’и нужны в страйпах (на hdd), а не в зеркале

Мы просто последовали советам из статей, это касается chunk_size

Сарказм насчет зеркалирования ssd накопителей не уместен, когда речь идет о дата-центре, который находится далеко за бугром, а на нем живет проект, который приносит серьезный доход. Оборудование не застрахованно от наводнения, от удара молотком, от разных механических и электромагнитных воздействий, поэтому мы не считаем массив из ssd чем то незаурядным.

Особенно бросилось в глаза:

In addition, keep in mind that MD (software raid) does not support discards.

Да и проверить не мешает, работает ли TRIM:

#!/bin/bash # # Test if TRIM is working on your SSD. Tested only with EXT4 filesystems # in Ubuntu 11.10 and Fedora 16. This script is simply an automation of # the procedures described by Nicolay Doytchev here: # # https://sites.google.com/site/lightrush/random-1/checkiftrimonext4isenabledandworking # # Author: Dorian Bolivar # Date: 20120129 # if [ $# -ne 3 ]; then echo echo "Usage: $0  " echo echo " is a temporary file for the test" echo " is the file size in MB" echo " is the device being tested, e.g. /dev/sda" echo echo "Example: $0 tempfile 5 /dev/sda" echo echo "This would run the test for /dev/sda creating a" echo "temporary file named \"tempfile\" with 5 MB" echo exit 1 fi FILE="$1" SIZE=$2 DEVICE="$3" # Create the temporary file dd if=/dev/urandom of="$FILE" count=1 bs=$ M oflag=direct sync # Get the address of the first sector hdparm --fibmap "$FILE" SECTOR=`hdparm --fibmap "$FILE" | tail -n1 | awk '< print $2; >'` # Read the first sector prior to deletion hdparm --read-sector $SECTOR "$DEVICE" echo echo "This is a sector of the file. It should have been successfully read" echo "and show a bunch of random data." echo read -n 1 -p "Press any key to continue. " # Delete the file and re-read the sector rm -f $FILE sync echo echo "File deleted. Sleeping for 120 seconds before re-reading the sector." echo "If TRIM is working, you should see all 0s now." sleep 120 hdparm --read-sector $SECTOR "$DEVICE" echo echo "If the sector isn't filled with 0s, something is wrong with your" echo "configuration. Try googling for \"TRIM SSD Linux\"." echo exit 0

Источник

Оцените статью
Adblock
detector