The md system has the following functionality available:
echo check > /sys/block/mdX/md/sync_action
echo repair > /sys/block/mdX/md/sync_action
A recent discusion between Roy Waldspurger and Neil Brown:
On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am trying to figure out what exactly to schedule. The answers to the following questions might shed some light on this:
1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE "CHECK" AND "REPAIR" COMMANDS?
The md.txt doc mentions for "check" that "a repair may also happen for some raid levels."
Which RAID levels, and in what cases? If I perform a "check" is there a cache of bad blocks that need to be fixed that can quickly be repaired by executing the "repair" command? Or would it go through the entire array again? I'm working with new drives, and haven't come across any bad blocks to test this with.
check just reads everything and doesn't trigger any writes unless a read error is detected, in which case the normal read-error handing kicks in. So it can be useful on a read-only array.
repair does that same but when it finds an inconsistency is corrects it by writing something. If any raid personality had not be taught to specifically understand check, then a check run would effect a repair. I think 2.6.17 will have all personalities doing the right thing.
check doesn't keep a record of problems, just a count. repair will reprocess the whole array.
2. CAN "CHECK" BE RUN ON A DEGRADED ARRAY (say with N out of N 1 disks on a RAID level 5)? I can test this out, but was it designed to do this, versus "REPAIR" only working on a full set of active drives? Perhaps "repair" is assuming that I have N 1 disks so that parity can be WRITTEN?
No, check on a degraded raid5, or a raid6 with 2 missing devices, or a raid1 with only one device will not do anything. It will terminate immediately. After all, there is nothing useful that it can do.
3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in dmesg logging output such as "raid5:read error corrected!", is that right? I realize that "mismatch_count" can also be used to see if there was any "action" during a "check" or "repair." I'm assuming this stuff doesn't make its way into an email.
You are correct on all counts. mdadm --monitor doesn't know about this yet. ((writes notes in mdadm todo list)).
4. DOES "REPAIR" PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS? (I know, it's sorta a repeat of question number 1 2).
repair only writes when necessary. In the normal case, it will only read every block.
5. IS THERE ILL-EFFECT TO STOP EITHER "CHECK" OR "REPAIR" BY ISSUING "IDLE"?
6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS? And to keep track of which blocks were checked? The motivation is to start checking some blocks overnight, and to pick-up where I left off the next night...
Not yet. It might be possible one day.
7. ANY OTHER CONSIDERATIONS WHEN "SCRUBBING" THE RAID?
Not that I am aware of.
Starting from version 2.6, Linux kernel has several choices about the I/O scheduler to be used. The anticipatory scheduler seems to be sub-optimal on high (eg [resync]) loads. If your kernel has the CFQ scheduler compiled in, it can be used during a resync.
From the command line you can see which schedulers are supported and change it on the fly (remember to do it for all devices composing the RAID):
# cat /sys/block/hda/queue/scheduler noop [anticipatory] deadline cfq # echo cfq > /sys/block/hda/queue/scheduler
Otherwise you can recompile your kernel and set CFQ as the default I/O scheduler (CONFIG_DEFAULT_CFQ=y in Block layer, IO Schedulers, Default I/O scheduler) or simply passing elevator=cfq on the kernel command line at boot time (see the Documentation/kernel-parameters.txt document corresponding to your kernel version).