Revision as of 01:24, 1 December 2010

When Things Go Wrong

There are two kinds of failure with RAID systems: failures that reduce the resilience and failures that prevent the raid device from operating.

Normally a single disk failure will degrade the raid device but it will continue operating (that is the point of RAID after all).

However there will come a point when enough component devices fail that the raid device stops working.

If this happens then first of all: don't panic. Seriously. Don't rush into anything; don't issue any commands that will write to the disks (like mdadm -C , fsck or even mount etc).

The first thing to do is to start to preserve information. You'll need data from /var/log/messages, dmesg etc.

Recreating an array

When an array is created, the data areas are not written to, *provided* the array is created in degraded mode; that is with a 'missing' device.

So if you somehow screw up your array and can't remember how it was originally created, you can re-run the create command using various permutations until the data is readable.

This perl script is an un-tested prototype : permute_array.pl

Preserving RAID superblock information

One of the most useful things to do first, when trying to recover a broken RAID array, is to preserve the information reported in the RAID superblocks on each device at the time the array went down (and before you start trying to recreate the array). Something like

mdadm --examine /dev/sd[bcdefghijklmn]1 > raid.status

(adjust this to suit your drives) creates a file, raid.status, which is a sequential listing of the mdadm --examine output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with mdadm --create, which is the point.

Restore array by recreating (after multiple device failure)

This section applies to a RAID5 that has temporarily lost more than one device, or a RAID6 that has lost more than two devices, and cannot be assembled without using force (you probably don't want to do that) because the devices are out of sync. It assumes that the devices themselves are available, the data is on them, and that our "failure" is e.g. the loss of a controller taking out four drives. The author recently dealt with precisely this scenario during a reshape, and found a lot of information in the mailing list archives that he will aim to reproduce here, with examples.

For the sake of example, assume we have a ten-disk RAID6 that's already lost two drives and is 80% of the way through a reshape, when we suddenly lose four drives at a stroke. Our broken array now looks like this:

Array State : A......AAA ('A' == active, '.' == missing),

with the four drives that went south looking like this:

Array State : AAAAA..AAA ('A' == active, '.' == missing)

and also having a lower event count. An attempt at assembling the array tells us that we have four drives out of ten, not enough to start the array. At this point, we have no option but to recreate the array, which involves telling mdadm --create which devices to use in what slots in order to put the array back together the way it was before. Assuming you made a dump of mdadm --examine as described above, you can do something like:

grep Role raid.status

and you will get output such as:

  Device Role : Active device 0
  Device Role : Active device 1
  Device Role : Active device 2
  Device Role : Active device 3
  Device Role : Active device 4
  Device Role : spare
  Device Role : spare
  Device Role : spare
  Device Role : Active device 9
  Device Role : Active device 8
  Device Role : Active device 7
  Device Role : spare
  Device Role : spare

(example output comes from a 10-drive RAID6 as described above). Knowing that this list starts with /dev/sdb1 and works its way sequentially through to /dev/sdn1, we can work out that slots 0 to 4 are filled by /dev/sd[bcdef]1, slots 5 and 6 are missing and that slots 7,8 and 9 are filled by /dev/sd[lkj]1 in that order. This is what mdadm --create needs to know to put the array back together in the order it was created. So, in our example case, the command to recreate the array was:

mdadm --create --assume-clean --level=6 --raid-devices=10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 missing missing /dev/sdl1 /dev/sdk1 /dev/sdj1

This duly told us what it found on the disks, and asked for confirmation. Please make sure you've preserved that mdadm --examine output before you give that confirmation, just in case you screw up. Once you're sure, go on and create the array. With any luck, your array will be created and assembled in degraded mode; we were then able to mount the ext4 filesystem and verify that we'd got it right before adding back in other devices as spares, triggering a conventional RAID recovery.

RAID Recovery

Revision as of 01:24, 1 December 2010

Contents

When Things Go Wrong

Recreating an array

Preserving RAID superblock information

Restore array by recreating (after multiple device failure)

Views

Personal tools

Navigation

Search

Tools

@@ Line 27: / Line 27: @@
 </code>
-(adjust to suit your drives) creates a file, raid.status, which is a sequential listing of the <code>mdadm --examine</code> output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with <code>mdadm --create</code>, which is the point.
+(adjust this to suit your drives) creates a file, raid.status, which is a sequential listing of the <code>mdadm --examine</code> output for all the RAID devices on my system, in order. The file should also still be there five minutes later when we start messing with <code>mdadm --create</code>, which is the point.
+==Restore array by recreating (after multiple device failure)==
+This section applies to a RAID5 that has temporarily lost more than one device, or a RAID6 that has lost more than two devices, and cannot be assembled without using force (you probably don't want to do that) because the devices are out of sync. It assumes that the devices themselves are available, the data is on them, and that our "failure" is e.g. the loss of a controller taking out four drives. The author recently dealt with precisely this scenario during a reshape, and found a lot of information in the mailing list archives that he will aim to reproduce here, with examples.
+For the sake of example, assume we have a ten-disk RAID6 that's already lost two drives and is 80% of the way through a reshape, when we suddenly lose four drives at a stroke. Our broken array now looks like this:
+<code>Array State : A......AAA ('A' == active, '.' == missing)</code>,
+with the four drives that went south looking like this:
+<code>Array State : AAAAA..AAA ('A' == active, '.' == missing)</code>
+and also having a lower event count. An attempt at assembling the array tells us that we have four drives out of ten, not enough to start the array. At this point, we have no option but to recreate the array, which involves telling <code>mdadm --create</code> which devices to use in what slots in order to put the array back together the way it was before. Assuming you made a dump of <code>mdadm --examine</code> as described above, you can do something like:
+<code>grep Role raid.status</code>
+and you will get output such as:
+   Device Role : Active device 0
+   Device Role : Active device 1
+   Device Role : Active device 2
+   Device Role : Active device 3
+   Device Role : Active device 4
+   Device Role : spare
+   Device Role : spare
+   Device Role : spare
+   Device Role : Active device 9
+   Device Role : Active device 8
+   Device Role : Active device 7
+   Device Role : spare
+   Device Role : spare
+(example output comes from a 10-drive RAID6 as described above). Knowing that this list starts with <code>/dev/sdb1</code> and works its way sequentially through to <code>/dev/sdn1</code>, we can work out that slots 0 to 4 are filled by <code>/dev/sd[bcdef]1</code>, slots 5 and 6 are missing and that slots 7,8 and 9 are filled by <code>/dev/sd[lkj]1</code> in that order. This is what <code>mdadm --create</code> needs to know to put the array back together in the order it was created. So, in our example case, the command to recreate the array was:
+<code>mdadm --create --assume-clean --level=6 --raid-devices=10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 missing missing /dev/sdl1 /dev/sdk1 /dev/sdj1
+</code>
+This duly told us what it found on the disks, and asked for confirmation. Please make sure you've preserved that <code>mdadm --examine</code> output before you give that confirmation, just in case you screw up. Once you're sure, go on and create the array. With any luck, your array will be created and assembled in degraded mode; we were then able to mount the ext4 filesystem and verify that we'd got it right before adding back in other devices as spares, triggering a conventional RAID recovery.