Recovering a failed software RAID

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(OK -> Hardware error.)
 
Line 5: Line 5:
 
So your /proc/mdstats looks something like this:
 
So your /proc/mdstats looks something like this:
  
md3 : active raid6 loop3[10](S) loop39[9] loop38[8] loop37[7] loop36[6] loop35[5] loop34[4] loop33[3](F) loop32[2](F) loop31[1](F) loop30[0]
+
  md3 : active raid6 loop3[10](S) loop39[9] loop38[8] loop37[7] loop36[6] loop35[5] loop34[4] loop33[3](F) loop32[2](F) loop31[1](F) loop30[0]
      7168 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/7] [U___UUUUUU]
+
  7168 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/7] [U___UUUUUU]
  
 
Here is a RAID6 that has lot 3 harddisks.
 
Here is a RAID6 that has lot 3 harddisks.
Line 14: Line 14:
 
Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK. We do that by reading every single harddisk.
 
Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK. We do that by reading every single harddisk.
  
   DEVICES="/dev/sdc /dev/sdd /dev/sdf"
+
  # List the devices used in the RAID
 +
   DEVICES="/dev/sdc1 /dev/sdd1 /dev/sdf1"
 
   parallel -j0 dd if={} of=/dev/null ::: $DEVICES
 
   parallel -j0 dd if={} of=/dev/null ::: $DEVICES
  
Line 24: Line 25:
 
=== Hardware error ===
 
=== Hardware error ===
  
If the reading fails for a drive, you need to copy that drive to a new drive. Do that using GNU ddrescue. ddrescue can read forwards (fast) and backwards (slow). This is useful since you can sometime read a sector if you read it from "the other side". By giving ddrescue a log-file it will skip the parts that have already been copied successfully. Thereby it is OK to reboot your system, if the copying makes the system stuck.
+
If the reading fails for a harddisk, you need to copy that harddisk to a new harddisk. Do that using GNU ddrescue. ddrescue can read forwards (fast) and backwards (slow). This is useful since you can sometimes only read a sector if you read it from "the other side". By giving ddrescue a log-file it will skip the parts that have already been copied successfully. Thereby it is OK to reboot your system, if the copying makes the system stuck: The copying will continue where it left off.
  
 
   ddrescue -r 3 /dev/old /dev/new my_log
 
   ddrescue -r 3 /dev/old /dev/new my_log
Line 36: Line 37:
 
= Notes below - must be fleshed out =
 
= Notes below - must be fleshed out =
  
== Making the devices read-only ==
+
== Making the devices read-only using an overlay file ==
  
http://unix.stackexchange.com/questions/67678/gnu-linux-overlay-block-device-stackable-block-device
+
When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow.
 +
 
 +
A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of capacity of the harddisks in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.
 +
 
 +
Each overlay file will need a loop-device, so create that:
 +
 
 +
  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES
 +
 
 +
Now create an overlay file for each device. Here it is assumed that your filsystem supports big, sparse files and the harddisks are 4TB. If it fails create a smaller file (usually 1% of the harddisk capacity is sufficient):
 +
 
 +
  parallel truncate -s4000G overlay-{/} ::: $DEVICES
 +
 
 +
Setup the loop-device and the overlay device:
 +
 
 +
  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); printf '%s\\\\n' "0 $size snapshot {} $loop P 8" | dmsetup create {/}' ::: $DEVICES
 +
 
 +
Now the overlay devices are in /dev/mapper/*. You can check their disk usage using:
 +
 
 +
  dmsetup status
 +
 
 +
=== Reset overlay file ===
 +
 
 +
You may later need to reset to go back to the original situation. You do that by:
 +
 
 +
  parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
 +
  parallel losetup -d ::: /dev/loop*
  
  

Revision as of 12:04, 3 May 2013

The software RAID in Linux is well tested, but even with well tested software RAID can fail.

In the following it is assume that you have a software RAID where a disk more than the redundancy has failed.

So your /proc/mdstats looks something like this:

 md3 : active raid6 loop3[10](S) loop39[9] loop38[8] loop37[7] loop36[6] loop35[5] loop34[4] loop33[3](F) loop32[2](F) loop31[1](F) loop30[0]
 7168 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/7] [U___UUUUUU]

Here is a RAID6 that has lot 3 harddisks.

Contents

Check your hardware

Harddisks fall off a RAID for all sorts of reasons. Some of them are intermittent, so first we need to check if the harddisks are OK. We do that by reading every single harddisk.

 # List the devices used in the RAID
 DEVICES="/dev/sdc1 /dev/sdd1 /dev/sdf1"
 parallel -j0 dd if={} of=/dev/null ::: $DEVICES

If you do not have GNU Parallel, it can be installed by:

 wget -O - pi.dk/3 | bash


Hardware error

If the reading fails for a harddisk, you need to copy that harddisk to a new harddisk. Do that using GNU ddrescue. ddrescue can read forwards (fast) and backwards (slow). This is useful since you can sometimes only read a sector if you read it from "the other side". By giving ddrescue a log-file it will skip the parts that have already been copied successfully. Thereby it is OK to reboot your system, if the copying makes the system stuck: The copying will continue where it left off.

 ddrescue -r 3 /dev/old /dev/new my_log
 ddrescue -R -r 3 /dev/old /dev/new my_log

where /dev/old is the harddisk with errors and /dev/new is the new empty harddisk.

Re-test that you can now read all sectors using 'dd', and remove /dev/old from the system.


Notes below - must be fleshed out

Making the devices read-only using an overlay file

When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow.

A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of capacity of the harddisks in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.

Each overlay file will need a loop-device, so create that:

 parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES

Now create an overlay file for each device. Here it is assumed that your filsystem supports big, sparse files and the harddisks are 4TB. If it fails create a smaller file (usually 1% of the harddisk capacity is sufficient):

 parallel truncate -s4000G overlay-{/} ::: $DEVICES

Setup the loop-device and the overlay device:

 parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); printf '%s\\\\n' "0 $size snapshot {} $loop P 8" | dmsetup create {/}' ::: $DEVICES

Now the overlay devices are in /dev/mapper/*. You can check their disk usage using:

 dmsetup status

Reset overlay file

You may later need to reset to go back to the original situation. You do that by:

 parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES 
 parallel losetup -d ::: /dev/loop*


Identify which drives are used for what

Identify the currently active harddisks and the last failing (but fully synced) hard disk.


Force assembly

--assemble --force (--scan may be incorrect due to the overlay devices).


File system check

Some file systems wants mount before fsck (xfs).

mount (for xfs)

umount (for xfs)

fsck

mount

check files are there. If not, roll back to the overlay to non-modified, and try different options. For xfs_repair that can include -L to remove the log.

Personal tools