RAID Recovery
(→Preserving the RAID drive configuration) |
(→Preserving RAID superblock information) |
||
Line 27: | Line 27: | ||
</code> | </code> | ||
− | (adjust to suit your drives) creates a file, raid.status, which is a sequential listing of the <code>mdadm --examine</code> output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with <code>mdadm --create</code>, which is the point. | + | (adjust this to suit your drives) creates a file, raid.status, which is a sequential listing of the <code>mdadm --examine</code> output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with <code>mdadm --create</code>, which is the point. |
+ | |||
+ | ==Restore array by recreating (after multiple device failure)== | ||
+ | |||
+ | This section applies to a RAID5 that has temporarily lost more than one device, or a RAID6 that has lost more than two devices, and cannot be assembled without using force (you probably don't want to do that) because the devices are out of sync. It assumes that the devices themselves are available, the data is on them, and that our "failure" is e.g. the loss of a controller taking out four drives. The author recently dealt with precisely this scenario during a reshape, and found a lot of information in the mailing list archives that he will aim to reproduce here, with examples. | ||
+ | |||
+ | For the sake of example, assume we have a ten-disk RAID6 that's already lost two drives and is 80% of the way through a reshape, when we suddenly lose four drives at a stroke. Our broken array now looks like this: | ||
+ | |||
+ | <code>Array State : A......AAA ('A' == active, '.' == missing)</code>, | ||
+ | |||
+ | with the four drives that went south looking like this: | ||
+ | |||
+ | <code>Array State : AAAAA..AAA ('A' == active, '.' == missing)</code> | ||
+ | |||
+ | and also having a lower event count. An attempt at assembling the array tells us that we have four drives out of ten, not enough to start the array. At this point, we have no option but to recreate the array, which involves telling <code>mdadm --create</code> which devices to use in what slots in order to put the array back together the way it was before. Assuming you made a dump of <code>mdadm --examine</code> as described above, you can do something like: | ||
+ | |||
+ | <code>grep Role raid.status</code> | ||
+ | |||
+ | and you will get output such as: | ||
+ | Device Role : Active device 0 | ||
+ | Device Role : Active device 1 | ||
+ | Device Role : Active device 2 | ||
+ | Device Role : Active device 3 | ||
+ | Device Role : Active device 4 | ||
+ | Device Role : spare | ||
+ | Device Role : spare | ||
+ | Device Role : spare | ||
+ | Device Role : Active device 9 | ||
+ | Device Role : Active device 8 | ||
+ | Device Role : Active device 7 | ||
+ | Device Role : spare | ||
+ | Device Role : spare | ||
+ | |||
+ | (example output comes from a 10-drive RAID6 as described above). Knowing that this list starts with <code>/dev/sdb1</code> and works its way sequentially through to <code>/dev/sdn1</code>, we can work out that slots 0 to 4 are filled by <code>/dev/sd[bcdef]1</code>, slots 5 and 6 are missing and that slots 7,8 and 9 are filled by <code>/dev/sd[lkj]1</code> in that order. This is what <code>mdadm --create</code> needs to know to put the array back together in the order it was created. So, in our example case, the command to recreate the array was: | ||
+ | |||
+ | <code>mdadm --create --assume-clean --level=6 --raid-devices=10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 missing missing /dev/sdl1 /dev/sdk1 /dev/sdj1 | ||
+ | </code> | ||
+ | |||
+ | This duly told us what it found on the disks, and asked for confirmation. Please make sure you've preserved that <code>mdadm --examine</code> output before you give that confirmation, just in case you screw up. Once you're sure, go on and create the array. With any luck, your array will be created and assembled in degraded mode; we were then able to mount the ext4 filesystem and verify that we'd got it right before adding back in other devices as spares, triggering a conventional RAID recovery. |
Revision as of 01:24, 1 December 2010
Contents |
When Things Go Wrong
There are two kinds of failure with RAID systems: failures that reduce the resilience and failures that prevent the raid device from operating.
Normally a single disk failure will degrade the raid device but it will continue operating (that is the point of RAID after all).
However there will come a point when enough component devices fail that the raid device stops working.
If this happens then first of all: don't panic. Seriously. Don't rush into anything; don't issue any commands that will write to the disks (like mdadm -C
, fsck
or even mount
etc).
The first thing to do is to start to preserve information. You'll need data from /var/log/messages
, dmesg
etc.
Recreating an array
When an array is created, the data areas are not written to, *provided* the array is created in degraded mode; that is with a 'missing' device.
So if you somehow screw up your array and can't remember how it was originally created, you can re-run the create command using various permutations until the data is readable.
This perl script is an un-tested prototype : permute_array.pl
Preserving RAID superblock information
One of the most useful things to do first, when trying to recover a broken RAID array, is to preserve the information reported in the RAID superblocks on each device at the time the array went down (and before you start trying to recreate the array). Something like
mdadm --examine /dev/sd[bcdefghijklmn]1 > raid.status
(adjust this to suit your drives) creates a file, raid.status, which is a sequential listing of the mdadm --examine
output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with mdadm --create
, which is the point.
Restore array by recreating (after multiple device failure)
This section applies to a RAID5 that has temporarily lost more than one device, or a RAID6 that has lost more than two devices, and cannot be assembled without using force (you probably don't want to do that) because the devices are out of sync. It assumes that the devices themselves are available, the data is on them, and that our "failure" is e.g. the loss of a controller taking out four drives. The author recently dealt with precisely this scenario during a reshape, and found a lot of information in the mailing list archives that he will aim to reproduce here, with examples.
For the sake of example, assume we have a ten-disk RAID6 that's already lost two drives and is 80% of the way through a reshape, when we suddenly lose four drives at a stroke. Our broken array now looks like this:
Array State : A......AAA ('A' == active, '.' == missing)
,
with the four drives that went south looking like this:
Array State : AAAAA..AAA ('A' == active, '.' == missing)
and also having a lower event count. An attempt at assembling the array tells us that we have four drives out of ten, not enough to start the array. At this point, we have no option but to recreate the array, which involves telling mdadm --create
which devices to use in what slots in order to put the array back together the way it was before. Assuming you made a dump of mdadm --examine
as described above, you can do something like:
grep Role raid.status
and you will get output such as:
Device Role : Active device 0 Device Role : Active device 1 Device Role : Active device 2 Device Role : Active device 3 Device Role : Active device 4 Device Role : spare Device Role : spare Device Role : spare Device Role : Active device 9 Device Role : Active device 8 Device Role : Active device 7 Device Role : spare Device Role : spare
(example output comes from a 10-drive RAID6 as described above). Knowing that this list starts with /dev/sdb1
and works its way sequentially through to /dev/sdn1
, we can work out that slots 0 to 4 are filled by /dev/sd[bcdef]1
, slots 5 and 6 are missing and that slots 7,8 and 9 are filled by /dev/sd[lkj]1
in that order. This is what mdadm --create
needs to know to put the array back together in the order it was created. So, in our example case, the command to recreate the array was:
mdadm --create --assume-clean --level=6 --raid-devices=10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 missing missing /dev/sdl1 /dev/sdk1 /dev/sdj1
This duly told us what it found on the disks, and asked for confirmation. Please make sure you've preserved that mdadm --examine
output before you give that confirmation, just in case you screw up. Once you're sure, go on and create the array. With any luck, your array will be created and assembled in degraded mode; we were then able to mount the ext4 filesystem and verify that we'd got it right before adding back in other devices as spares, triggering a conventional RAID recovery.