Recovering a damaged RAID

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Initial creation (wip))
 
m (Identifying the RAID)
Line 7: Line 7:
  
 
The previous pages cover replacing failed drives. This covers recovering arrays where the drive is okay but the array is damaged.
 
The previous pages cover replacing failed drives. This covers recovering arrays where the drive is okay but the array is damaged.
 +
 +
== Utilities required by the examples on this page ==
 +
 +
This uses gnu parallel to run multiple instances of a command with different arguments. Given that we often need to run the same command over multiple disks or partitions that make up an array, it just makes life easier ...
 +
 +
If your distro doesn't package it for you, you can download it from http://www.gnu.org/software/parallel/
  
 
== Identifying the RAID ==
 
== Identifying the RAID ==

Revision as of 19:19, 15 November 2016

Back to My array won't assemble / run Forward to Recovering a failed software RAID

WORK IN PROGRESS!!! See Recovering a failed software RAID for the original page.

The previous pages cover replacing failed drives. This covers recovering arrays where the drive is okay but the array is damaged.

Contents

Utilities required by the examples on this page

This uses gnu parallel to run multiple instances of a command with different arguments. Given that we often need to run the same command over multiple disks or partitions that make up an array, it just makes life easier ...

If your distro doesn't package it for you, you can download it from http://www.gnu.org/software/parallel/

Identifying the RAID

We will need the UUID of the array to identify the harddisks. This is especially important if you have multiple RAIDs connected to the system. Retrieve the array UUID from any valid partition that makes up the array of interest (here /dev/sdj1):

 $ UUID=$(mdadm -E /dev/sdj1|perl -ne '/Array UUID : (\S+)/ and print $1')
 $ echo $UUID
 ef1de98a:35abe6d9:bcfa355a:d30dfc24

We use the $UUID to identify all the partitions that make up the array:

 $ DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
 {5}     mdadm: cannot open /dev/{5}: No such file or directory
 sda1    mdadm: No md superblock detected on /dev/sda1.
 sdb1    mdadm: No md superblock detected on /dev/sdb1.
 $ echo $DEVICES
 /dev/sdj1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1

Making the harddisks read-only using an overlay file

When trying to fix a broken RAID we may cause more damage, so we need a way to revert to the current situation. One way is to make a full harddisk-to-harddisk image of every harddisk. This is slow and requires a full set of empty harddisks which may be expensive.

A faster solution is to overlay every device with a file. All changes will be written to the file and the actual device is untouched. We need to make sure the file is big enough to hold all changes, but 'fsck' normally will not change a lot, so your local file system should be able to hold around 1% of used space in the RAID. If your filesystem supports big, sparse files, you can simply make a sparse overlay file for each harddisk the same size as the harddisk.

Each overlay file will need a loop-device, so create that:

 parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES

Now create an overlay file for each device. Here it is assumed that your filsystem supports big, sparse files and the harddisks are 4TB. If it fails create a smaller file (usually 1% of the harddisk capacity is sufficient):

 parallel truncate -s4000G overlay-{/} ::: $DEVICES

Setup the loop-device and the overlay device:

 parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES

Now the overlay devices are in /dev/mapper/*:

 $ OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
 $ echo $OVERLAYS 
 /dev/mapper/sds1 /dev/mapper/sdt1 /dev/mapper/sdq1 /dev/mapper/sdu1 /dev/mapper/sdv1 /dev/mapper/sdw1

You can check the disk usage of the overlay files using:

 dmsetup status

Reset overlay file

You may later need to reset to go back to the original situation. You do that by:

 parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES 
 parallel losetup -d ::: /dev/loop[0-9]*

Overlay manipulation functions

devices="/dev/sda /dev/sdb /dev/sdc"

overlay_create()
{
        free=$((`stat -c '%a*%S/1024/1024' -f .`))
        echo free ${free}M
        overlays=""
        overlay_remove
        for d in $devices; do
                b=$(basename $d)
                size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
                # reserve 1M space for snapshot header
                # ext3 max file length is 2TB   
                truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo "Do you use ext4?"; return 1)
                loop=$(losetup -f --show -- $b.ovr)
                # https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
                dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
                echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
                overlays="$overlays /dev/mapper/$b"
        done
        overlays=${overlays# }
}

overlay_remove()
{
        for d in $devices; do
                b=$(basename $d)
                [ -e /dev/mapper/$b ] && dmsetup remove $b && echo /dev/mapper/$b 
                if [ -e $b.ovr ]; then
                        echo $b.ovr
                        l=$(losetup -j $b.ovr | cut -d : -f1)
                        echo $l
                        [ -n "$l" ] && losetup -d $(losetup -j $b.ovr | cut -d : -f1)
                        rm -f $b.ovr &> /dev/null
                fi
        done
}


Back to My array won't assemble / run Forward to Recovering a failed software RAID
Personal tools