Recovering a failed software RAID

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Hardware error)
(Add obsolete notice)
 
(44 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
'''Notice: This page is obsolete. Well worth reading, but read it in conjunction with the main page "[[Linux_Raid#When_Things_Go_Wrogn]]"'''
 +
 +
'''Notice: The pages "[[RAID Recovery]]" and "[[Recovering a failed software_RAID]]" both cover this topic. "[[Recovering a failed software_RAID]]" is safe to do as it does not make any changes to the RAID - except in the final stage.'''
 +
 
The software RAID in Linux is well tested, but even with well tested software, RAID can fail.
 
The software RAID in Linux is well tested, but even with well tested software, RAID can fail.
  
Line 10: Line 14:
 
Here is a RAID6 that has lost 3 harddisks.
 
Here is a RAID6 that has lost 3 harddisks.
  
== What happened? ==
+
Before you try this document on real data, you might want to try it out on a bunch of USB-sticks. This will familiarize you with the procedure without any risk of losing data.
 +
 
 +
== Setting the scene ==
  
 
This article will deal with the following case. It starts out as a perfect RAID6 (state 1):
 
This article will deal with the following case. It starts out as a perfect RAID6 (state 1):
Line 23: Line 29:
 
       [===>.................]  recovery = 16.0% (16744/101888) finish=1.7min speed=797K/sec
 
       [===>.................]  recovery = 16.0% (16744/101888) finish=1.7min speed=797K/sec
  
During the rebuild /dev/sdg1 fails, too. Now all redundancy is lost, and loosing another data disk will fail the RAID. The rebuild on /dev/sdn1 continues (state 3):
+
During the rebuild /dev/sdg1 fails, too. Now all redundancy is lost, and losing another data disk will fail the RAID. The rebuild on /dev/sdn1 continues (state 3):
  
 
   md0 : active raid6 sdn1[6] sdm1[5] sdk1[3](F) sdj1[2] sdh1[1] sdg1[0](F)
 
   md0 : active raid6 sdn1[6] sdm1[5] sdk1[3](F) sdj1[2] sdh1[1] sdg1[0](F)
Line 35: Line 41:
  
 
This is the situation we are going to recover from. The goal is to get back to state 3 with minimal data loss.
 
This is the situation we are going to recover from. The goal is to get back to state 3 with minimal data loss.
 
  
 
== Tools ==
 
== Tools ==
Line 56: Line 61:
 
   ef1de98a:35abe6d9:bcfa355a:d30dfc24
 
   ef1de98a:35abe6d9:bcfa355a:d30dfc24
  
The harddisks are right now kicked off by the kernel and not visible any more, so you need to make the kernel re-discover the devices. That can be done by re-seating the harddisks (if they are hotswap) or by rebooting. After the re-reating/rebooting the failed harddisks will often be given different device names.
+
The failed harddisks are right now kicked off by the kernel and not visible anymore, so you need to make the kernel re-discover the devices. That can be done by re-seating the harddisks (if they are hotswap) or by rebooting. After the re-seating/rebooting the failed harddisks will often be given different device names.
  
 
We use the $UUID to identify the new device names:
 
We use the $UUID to identify the new device names:
Line 66: Line 71:
 
   $ echo $DEVICES
 
   $ echo $DEVICES
 
   /dev/sdj1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1
 
   /dev/sdj1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1
 +
 +
== Stop the RAID ==
 +
 +
You should now stop the RAID as that may otherwise cause problems later on:
 +
 +
  mdadm --stop /dev/md0
 +
 +
If you cannot stop the RAID (due to the RAID being mounted), note down the RAID UUID and re-seat all the harddisks used by the RAID or reboot. Afterwards identify the devices again like we did before:
 +
 +
  $ UUID=ef1de98a:35abe6d9:bcfa355a:d30dfc24
 +
  $ DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
 +
  $ echo $DEVICES
 +
  /dev/sdq1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1
  
 
== Check your hardware ==
 
== Check your hardware ==
Line 71: Line 89:
 
Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK.
 
Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK.
  
We do that by reading every sector on every single harddisk.
+
We do that by reading every sector on every harddisk in th RAID.
  
   parallel -j0 dd if={} of=/dev/null ::: $DEVICES
+
   parallel -j0 dd if={} of=/dev/null bs=1M ::: $DEVICES
  
 +
This can take a long time (days on big harddisks). You can, however, leave this running while continuing through this guide.
  
 
=== Hardware error ===
 
=== Hardware error ===
Line 85: Line 104:
 
where /dev/old is the harddisk with errors and /dev/new is the new empty harddisk.
 
where /dev/old is the harddisk with errors and /dev/new is the new empty harddisk.
  
Re-test that you can now read all sectors from /dev/new using 'dd', and remove /dev/old from the system. Then recompute $DEVICES:
+
Re-test that you can now read all sectors from /dev/new using 'dd', and remove /dev/old from the system. Then recompute $DEVICES to include the /dev/new:
  
 +
  UUID=$(mdadm -E /dev/sdj1|perl -ne '/Array UUID : (\S+)/ and print $1')
 
   DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
 
   DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
  
 
== Making the harddisks read-only using an overlay file ==
 
== Making the harddisks read-only using an overlay file ==
  
When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow and requires a full set of empty disks.
+
When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow and requires a full set of empty harddisks which may be expensive.
  
 
A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of used space in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.
 
A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of used space in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.
Line 107: Line 127:
 
   parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
 
   parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
  
Now the overlay devices are in /dev/mapper/*. You can check their disk usage using:
+
Now the overlay devices are in /dev/mapper/*:
 +
 
 +
  $ OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
 +
  $ echo $OVERLAYS
 +
  /dev/mapper/sds1 /dev/mapper/sdt1 /dev/mapper/sdq1 /dev/mapper/sdu1 /dev/mapper/sdv1 /dev/mapper/sdw1
 +
 
 +
You can check the disk usage of the overlay files using:
  
 
   dmsetup status
 
   dmsetup status
Line 116: Line 142:
  
 
   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES  
 
   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES  
   parallel losetup -d ::: /dev/loop*
+
   parallel losetup -d ::: /dev/loop[0-9]*
  
= Notes below - must be fleshed out =
+
=== Overlay manipulation functions ===
 +
<pre>
 +
devices="/dev/sda /dev/sdb /dev/sdc"
  
== Identify which drives are used for what ==
+
overlay_create()
 +
{
 +
        free=$((`stat -c '%a*%S/1024/1024' -f .`))
 +
        echo free ${free}M
 +
        overlays=""
 +
        overlay_remove
 +
        for d in $devices; do
 +
                b=$(basename $d)
 +
                size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
 +
                # reserve 1M space for snapshot header
 +
                # ext3 max file length is 2TB 
 +
                truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo "Do you use ext4?"; return 1)
 +
                loop=$(losetup -f --show -- $b.ovr)
 +
                # https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
 +
                dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
 +
                echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
 +
                overlays="$overlays /dev/mapper/$b"
 +
        done
 +
        overlays=${overlays# }
 +
}
  
Identify the currently active harddisks and the last failing (but fully synced) hard disk.
+
overlay_remove()
 +
{
 +
        for d in $devices; do
 +
                b=$(basename $d)
 +
                [ -e /dev/mapper/$b ] && dmsetup remove $b && echo /dev/mapper/$b
 +
                if [ -e $b.ovr ]; then
 +
                        echo $b.ovr
 +
                        l=$(losetup -j $b.ovr | cut -d : -f1)
 +
                        echo $l
 +
                        [ -n "$l" ] && losetup -d $(losetup -j $b.ovr | cut -d : -f1)
 +
                        rm -f $b.ovr &> /dev/null
 +
                fi
 +
        done
 +
}
 +
</pre>
  
  parallel --tag mdadm -E ::: $DEVICES |grep Upd
+
== Optional: figure out what happened ==
/dev/loop1          Update Time : Fri May  3 12:50:04 2013
+
/dev/loop10        Update Time : Fri May  3 12:50:04 2013
+
/dev/loop11        Update Time : Tue Apr 30 13:20:42 2013 - first to fail
+
/dev/loop12        Update Time : Fri May  3 12:48:42 2013 - second
+
/dev/loop13        Update Time : Fri May  3 12:48:51 2013 - third
+
/dev/loop14        Update Time : Fri May  3 12:49:49 2013 - fourth
+
/dev/loop15        Update Time : Fri May  3 12:49:57 2013 - fifth
+
/dev/loop16        Update Time : Fri May  3 12:50:04 2013
+
/dev/loop17        Update Time : Fri May  3 12:50:04 2013
+
/dev/loop18        Update Time : Fri May  3 12:50:04 2013
+
/dev/loop19        Update Time : Fri May  3 12:50:04 2013
+
  
The example should show an active disk that failed during rebuilding on a spare.
+
The Update time tells us which drive failed when:
 +
 
 +
  $ parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Update'
 +
  /dev/mapper/sdq1            Update Time : Sat May  4 15:32:43 2013 # 3rd to fail
 +
  /dev/mapper/sds1            Update Time : Sat May  4 15:32:03 2013 # 2nd to fail
 +
  /dev/mapper/sdt1            Update Time : Sat May  4 15:29:47 2013 # 1st to fail
 +
  /dev/mapper/sdu1            Update Time : Sat May  4 15:32:49 2013
 +
  /dev/mapper/sdv1            Update Time : Sat May  4 15:32:49 2013
 +
  /dev/mapper/sdw1            Update Time : Sat May  4 15:32:49 2013
 +
 
 +
Looking at each harddisk's Role it is clear that the 3 devices that failed were indeed data devices. The spare did not fail:
 +
 
 +
  $ parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Role'
 +
  /dev/mapper/sdq1          Device Role : Active device 1 # 3rd to fail
 +
  /dev/mapper/sds1          Device Role : Active device 0 # 2nd to fail
 +
  /dev/mapper/sdt1          Device Role : Active device 3 # 1st to fail
 +
  /dev/mapper/sdu1          Device Role : Active device 2
 +
  /dev/mapper/sdv1          Device Role : spare
 +
  /dev/mapper/sdw1          Device Role : Active device 4
 +
 
 +
So we are interested in assembling a RAID with the devices that were active last (sdu1, sdw1) and the last to fail (sdq1).
  
 
== Force assembly ==
 
== Force assembly ==
  
ACTIVE=some of $DEVICES
+
By forcing the assembly you can make mdadm clear the faulty state:
LAST_FAIL=last failing of $DEVICES
+
 
--assemble --force $ACTIVE $LAST_FAIL (--scan may be incorrect due to the overlay devices).
+
  $ mdadm --assemble --force /dev/md1 $OVERLAYS
 +
  mdadm: forcing event count in /dev/mapper/sdq1(1) from 143 upto 148
 +
  mdadm: clearing FAULTY flag for device 4 in /dev/md1 for /dev/mapper/sdv1
 +
  mdadm: Marking array /dev/md1 as 'clean'
 +
  mdadm: /dev/md1 has been started with 3 drives (out of 5) and 1 spare.
 +
 
 +
Rebuild will now start:
 +
 
 +
  $ cat /proc/mdstat
 +
  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
 +
  md1 : active raid6 dm-0[1] dm-4[6] dm-5[5] dm-3[2]
 +
      305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/3] [_UU_U]
 +
      [==>..................]  recovery = 11.5% (12284/101888) finish=0.4min speed=3071K/sec
 +
 
 +
It will rebuild on the overlay file, so you should pause the rebuild as the overlay file will otherwise eat your disk space:
 +
 
 +
  echo 0 > /proc/sys/dev/raid/speed_limit_max
 +
  echo 0 > /proc/sys/dev/raid/speed_limit_min
 +
 
 +
You can add back the remaining drives as spares:
 +
 
 +
  $ parallel -j1 mdadm --add /dev/md1 ::: $OVERLAYS
 +
  mdadm: Cannot open /dev/mapper/sdv1: Device or resource busy
 +
  $ cat /proc/mdstat
 +
  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
 +
  md1 : active raid6 dm-2[8](S) dm-1[7] dm-0[1] dm-4[6] dm-5[5] dm-3[2]
 +
      305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
 +
 
 +
=== Reset assembly ===
 +
 
 +
You may need to roll back the assembly. Do that by:
 +
 
 +
  mdadm --stop /dev/md1
  
 
== File system check ==
 
== File system check ==
  
Some file systems wants mount before fsck (xfs).
+
You now have an assembled RAID. But we now need to figure out if the filesystem is still OK.
 +
 
 +
=== XFS ===
 +
 
 +
XFS stores a log that it replays on mount. This should be done before trying to repair the file system:
 +
 
 +
  mount /dev/md1 /mnt/md1
 +
  # DO NOT USE THE FILESYSTEM, BUT IMMEDIATELY UMOUNT
 +
  umount /mnt/md1
 +
 
 +
In certain situations the filesystem will crash your computer if used before it has been repaired.
 +
 
 +
  xfs_repair /dev/md1
 +
 
 +
If xfs_repair fails, try with -L:
 +
 
 +
  xfs_repair -L /dev/md1
 +
 
 +
=== Other file systems ===
 +
 
 +
Run fsck on the RAID-device:
 +
 
 +
  fsck /dev/md1
 +
 
 +
If there are load of errors:
 +
 
 +
  fsck -y /dev/md1
  
mount (for xfs)
+
== Examine the filesystem ==
  
umount (for xfs)
+
After fixing the filesystem it is time to see if data survived. Mount the file system:
  
fsck
+
  mount /dev/md1 /mnt/md1
  
mount
+
And examine /mnt/md1. Do not write to it, as everything you write will go into the overlay files.
  
check files are there. If not, reset the overlay, setup the overlay again, and try different options. For xfs_repair that can include -L to remove the log.
+
If there are problems: reset the assembly, reset the overlay files and try different options. As long as you use the overlay files, it will be hard to destroy anything.
  
If everything is OK, then do the same mdadm --assemble, mount, fsck procedure, but this time on the real harddisks without the overlay files.
+
If everything is fine, you can now optionally make a backup before resetting the assembly, resetting the overlay files and do the fixing procedure on $DEVICES instead of $OVERLAYS. Congratulations: You just saved your data from a RAID failure.

Latest revision as of 18:38, 4 October 2017

Notice: This page is obsolete. Well worth reading, but read it in conjunction with the main page "Linux_Raid#When_Things_Go_Wrogn"

Notice: The pages "RAID Recovery" and "Recovering a failed software_RAID" both cover this topic. "Recovering a failed software_RAID" is safe to do as it does not make any changes to the RAID - except in the final stage.

The software RAID in Linux is well tested, but even with well tested software, RAID can fail.

In the following it is assumed that you have a software RAID where a disk more than the redundancy has failed.

So your /proc/mdstats looks something like this:

 md0 : active raid6 sdn1[6](S) sdm1[5] sdk1[3](F) sdj1[2] sdh1[1](F) sdg1[0](F)
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/2] [__U_U]

Here is a RAID6 that has lost 3 harddisks.

Before you try this document on real data, you might want to try it out on a bunch of USB-sticks. This will familiarize you with the procedure without any risk of losing data.

Contents

[edit] Setting the scene

This article will deal with the following case. It starts out as a perfect RAID6 (state 1):

 md0 : active raid6 sdn1[6](S) sdm1[5] sdk1[3] sdj1[2] sdh1[1] sdg1[0]
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

For some unknown reason /dev/sdk1 fails and rebuild starts on the spare /dev/sdn1 (state 2):

 md0 : active raid6 sdn1[6] sdm1[5] sdk1[3](F) sdj1[2] sdh1[1] sdg1[0]
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UUU_U]
     [===>.................]  recovery = 16.0% (16744/101888) finish=1.7min speed=797K/sec

During the rebuild /dev/sdg1 fails, too. Now all redundancy is lost, and losing another data disk will fail the RAID. The rebuild on /dev/sdn1 continues (state 3):

 md0 : active raid6 sdn1[6] sdm1[5] sdk1[3](F) sdj1[2] sdh1[1] sdg1[0](F)
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/3] [_UU_U]
     [===========>.........]  recovery = 59.0% (60900/101888) finish=0.6min speed=1018K/sec

Before the rebuild finishes, yet another data harddisk (/dev/sdh1) fails, thus failing the RAID. The rebuild on /dev/sdn1 cannot continue, so /dev/sdn1 reverts to its status as spare (state 4):

 md0 : active raid6 sdn1[6](S) sdm1[5] sdk1[3](F) sdj1[2] sdh1[1](F) sdg1[0](F)
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/2] [__U_U]

This is the situation we are going to recover from. The goal is to get back to state 3 with minimal data loss.

[edit] Tools

We will be using the following tools:

GNU Parallel - http://www.gnu.org/software/parallel/ If it is not packaged for your distribution install by:

 wget -O - pi.dk/3 | bash

GNU ddrescue - http://www.gnu.org/software/ddrescue/ddrescue.html


[edit] Identifying the RAID

We will need the UUID of the array to identify the harddisks. This is especially important if you have multiple RAIDs connected to the system. Take the UUID from one of the non-failed harddisks (here /dev/sdj1):

 $ UUID=$(mdadm -E /dev/sdj1|perl -ne '/Array UUID : (\S+)/ and print $1')
 $ echo $UUID
 ef1de98a:35abe6d9:bcfa355a:d30dfc24

The failed harddisks are right now kicked off by the kernel and not visible anymore, so you need to make the kernel re-discover the devices. That can be done by re-seating the harddisks (if they are hotswap) or by rebooting. After the re-seating/rebooting the failed harddisks will often be given different device names.

We use the $UUID to identify the new device names:

 $ DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
 {5}     mdadm: cannot open /dev/{5}: No such file or directory
 sda1    mdadm: No md superblock detected on /dev/sda1.
 sdb1    mdadm: No md superblock detected on /dev/sdb1.
 $ echo $DEVICES
 /dev/sdj1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1

[edit] Stop the RAID

You should now stop the RAID as that may otherwise cause problems later on:

 mdadm --stop /dev/md0

If you cannot stop the RAID (due to the RAID being mounted), note down the RAID UUID and re-seat all the harddisks used by the RAID or reboot. Afterwards identify the devices again like we did before:

 $ UUID=ef1de98a:35abe6d9:bcfa355a:d30dfc24
 $ DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
 $ echo $DEVICES
 /dev/sdq1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1

[edit] Check your hardware

Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK.

We do that by reading every sector on every harddisk in th RAID.

 parallel -j0 dd if={} of=/dev/null bs=1M ::: $DEVICES

This can take a long time (days on big harddisks). You can, however, leave this running while continuing through this guide.

[edit] Hardware error

If the reading fails for a harddisk, you need to copy that harddisk to a new harddisk. Do that using GNU ddrescue. ddrescue can read forwards (fast) and backwards (slow). This is useful since you can sometimes only read a sector if you read it from "the other side". By giving ddrescue a log-file it will skip the parts that have already been copied successfully. Thereby it is OK to reboot your system, if the copying makes the system hang: The copying will continue where it left off.

 ddrescue -r 3 /dev/old /dev/new my_log
 ddrescue -R -r 3 /dev/old /dev/new my_log

where /dev/old is the harddisk with errors and /dev/new is the new empty harddisk.

Re-test that you can now read all sectors from /dev/new using 'dd', and remove /dev/old from the system. Then recompute $DEVICES to include the /dev/new:

 UUID=$(mdadm -E /dev/sdj1|perl -ne '/Array UUID : (\S+)/ and print $1')
 DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})

[edit] Making the harddisks read-only using an overlay file

When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow and requires a full set of empty harddisks which may be expensive.

A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of used space in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.

Each overlay file will need a loop-device, so create that:

 parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES

Now create an overlay file for each device. Here it is assumed that your filsystem supports big, sparse files and the harddisks are 4TB. If it fails create a smaller file (usually 1% of the harddisk capacity is sufficient):

 parallel truncate -s4000G overlay-{/} ::: $DEVICES

Setup the loop-device and the overlay device:

 parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES

Now the overlay devices are in /dev/mapper/*:

 $ OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
 $ echo $OVERLAYS 
 /dev/mapper/sds1 /dev/mapper/sdt1 /dev/mapper/sdq1 /dev/mapper/sdu1 /dev/mapper/sdv1 /dev/mapper/sdw1

You can check the disk usage of the overlay files using:

 dmsetup status

[edit] Reset overlay file

You may later need to reset to go back to the original situation. You do that by:

 parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES 
 parallel losetup -d ::: /dev/loop[0-9]*

[edit] Overlay manipulation functions

devices="/dev/sda /dev/sdb /dev/sdc"

overlay_create()
{
        free=$((`stat -c '%a*%S/1024/1024' -f .`))
        echo free ${free}M
        overlays=""
        overlay_remove
        for d in $devices; do
                b=$(basename $d)
                size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
                # reserve 1M space for snapshot header
                # ext3 max file length is 2TB   
                truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo "Do you use ext4?"; return 1)
                loop=$(losetup -f --show -- $b.ovr)
                # https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
                dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
                echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
                overlays="$overlays /dev/mapper/$b"
        done
        overlays=${overlays# }
}

overlay_remove()
{
        for d in $devices; do
                b=$(basename $d)
                [ -e /dev/mapper/$b ] && dmsetup remove $b && echo /dev/mapper/$b 
                if [ -e $b.ovr ]; then
                        echo $b.ovr
                        l=$(losetup -j $b.ovr | cut -d : -f1)
                        echo $l
                        [ -n "$l" ] && losetup -d $(losetup -j $b.ovr | cut -d : -f1)
                        rm -f $b.ovr &> /dev/null
                fi
        done
}

[edit] Optional: figure out what happened

The Update time tells us which drive failed when:

 $ parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Update'
 /dev/mapper/sdq1            Update Time : Sat May  4 15:32:43 2013 # 3rd to fail
 /dev/mapper/sds1            Update Time : Sat May  4 15:32:03 2013 # 2nd to fail
 /dev/mapper/sdt1            Update Time : Sat May  4 15:29:47 2013 # 1st to fail
 /dev/mapper/sdu1            Update Time : Sat May  4 15:32:49 2013
 /dev/mapper/sdv1            Update Time : Sat May  4 15:32:49 2013
 /dev/mapper/sdw1            Update Time : Sat May  4 15:32:49 2013

Looking at each harddisk's Role it is clear that the 3 devices that failed were indeed data devices. The spare did not fail:

 $ parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Role'
 /dev/mapper/sdq1           Device Role : Active device 1 # 3rd to fail
 /dev/mapper/sds1           Device Role : Active device 0 # 2nd to fail
 /dev/mapper/sdt1           Device Role : Active device 3 # 1st to fail
 /dev/mapper/sdu1           Device Role : Active device 2
 /dev/mapper/sdv1           Device Role : spare
 /dev/mapper/sdw1           Device Role : Active device 4

So we are interested in assembling a RAID with the devices that were active last (sdu1, sdw1) and the last to fail (sdq1).

[edit] Force assembly

By forcing the assembly you can make mdadm clear the faulty state:

 $ mdadm --assemble --force /dev/md1 $OVERLAYS
 mdadm: forcing event count in /dev/mapper/sdq1(1) from 143 upto 148
 mdadm: clearing FAULTY flag for device 4 in /dev/md1 for /dev/mapper/sdv1
 mdadm: Marking array /dev/md1 as 'clean'
 mdadm: /dev/md1 has been started with 3 drives (out of 5) and 1 spare.

Rebuild will now start:

 $ cat /proc/mdstat 
 Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
 md1 : active raid6 dm-0[1] dm-4[6] dm-5[5] dm-3[2]
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/3] [_UU_U]
     [==>..................]  recovery = 11.5% (12284/101888) finish=0.4min speed=3071K/sec

It will rebuild on the overlay file, so you should pause the rebuild as the overlay file will otherwise eat your disk space:

 echo 0 > /proc/sys/dev/raid/speed_limit_max
 echo 0 > /proc/sys/dev/raid/speed_limit_min

You can add back the remaining drives as spares:

 $ parallel -j1 mdadm --add /dev/md1 ::: $OVERLAYS
 mdadm: Cannot open /dev/mapper/sdv1: Device or resource busy
 $ cat /proc/mdstat 
 Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
 md1 : active raid6 dm-2[8](S) dm-1[7] dm-0[1] dm-4[6] dm-5[5] dm-3[2]
     305664 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

[edit] Reset assembly

You may need to roll back the assembly. Do that by:

 mdadm --stop /dev/md1

[edit] File system check

You now have an assembled RAID. But we now need to figure out if the filesystem is still OK.

[edit] XFS

XFS stores a log that it replays on mount. This should be done before trying to repair the file system:

 mount /dev/md1 /mnt/md1
 # DO NOT USE THE FILESYSTEM, BUT IMMEDIATELY UMOUNT
 umount /mnt/md1

In certain situations the filesystem will crash your computer if used before it has been repaired.

 xfs_repair /dev/md1

If xfs_repair fails, try with -L:

 xfs_repair -L /dev/md1

[edit] Other file systems

Run fsck on the RAID-device:

 fsck /dev/md1

If there are load of errors:

 fsck -y /dev/md1

[edit] Examine the filesystem

After fixing the filesystem it is time to see if data survived. Mount the file system:

 mount /dev/md1 /mnt/md1

And examine /mnt/md1. Do not write to it, as everything you write will go into the overlay files.

If there are problems: reset the assembly, reset the overlay files and try different options. As long as you use the overlay files, it will be hard to destroy anything.

If everything is fine, you can now optionally make a backup before resetting the assembly, resetting the overlay files and do the fixing procedure on $DEVICES instead of $OVERLAYS. Congratulations: You just saved your data from a RAID failure.

Personal tools