Replacing a failed drive

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Initial creation (wip))
 
m
Line 3: Line 3:
 
| Back to [[Timeout Mismatch]] <span style="float:right; padding-left:5px;">Forward to [[Reconstruction]]</span>
 
| Back to [[Timeout Mismatch]] <span style="float:right; padding-left:5px;">Forward to [[Reconstruction]]</span>
 
|}
 
|}
 +
== Can you add a new drive? ==
  
== Is your array still redundant ==
+
One thing that repeatedly crops up is people trying to rebuild an array, but they have no spare SATA slots to add the new drive.
 +
Get an add-in card that adds extra SATA slots, or a USB disk cradle - preferably USB3, but USB2 will do. You can get a usb case
 +
instead, but a cradle that leaves the drive exposed is going to be a lot easier if you have to switch several drives.
 +
 
 +
If you're forced to rebuild an array, you want as much of the original array in place as possible. If a drive has failed completely,
 +
then there's no problem just taking it out and sticking a new drive in, but if the array is in trouble, then you want to --replace
 +
a drive if possible, and you can't do that if you don't have a spare slot to add the drive.
 +
 
 +
== Is your array still redundant? ==
  
 
You are running a three-disk mirror, or RAID6, I trust. And have a spare drive configured to take over when one fails?
 
You are running a three-disk mirror, or RAID6, I trust. And have a spare drive configured to take over when one fails?
  
This is the ideal scenario. When one drive fails, the RAID will seamlessly replace it with a spare, and life will carry on without the user noticing
+
This is the ideal scenario. When one drive fails, the RAID will seamlessly replace it with a spare, and life will carry on without the  
anything is wrong. Most of us don't have the hardware to support all that. Or, as has been told to the author on several occasions, the admins/operators were not monitoring the array and did not realise anything had gone wrong until (almost) too late.
+
user noticing anything is wrong. Most of us don't have the hardware to support all that. Or, as has been told to the author on several occasions, the admins/operators were not monitoring the array and did not realise anything had gone wrong until (almost) too late.
  
 
If you are running an array you need to monitor it. Failed drives must be removed and replaced as soon as possible. If your array is still redundant, then just remove the failed device and replace it.
 
If you are running an array you need to monitor it. Failed drives must be removed and replaced as soon as possible. If your array is still redundant, then just remove the failed device and replace it.
 +
 +
mdadm /dev/mdX [--fail /dev/sdx1] --remove /dev/sdx1 --add /dev/sdy1
 +
 +
== So you have no redundancy! ==
  
 
Remember, RAID is not a backup! If you lose redundancy, you need to take a backup! The act of trying to recover an array is often enough to trip another drive over the edge an cause the entire array to fail.
 
Remember, RAID is not a backup! If you lose redundancy, you need to take a backup! The act of trying to recover an array is often enough to trip another drive over the edge an cause the entire array to fail.
  
 
== ddrescue or rebuild ==
 
== ddrescue or rebuild ==
 +
  
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"

Revision as of 23:23, 23 September 2016

Back to Timeout Mismatch Forward to Reconstruction

Contents

Can you add a new drive?

One thing that repeatedly crops up is people trying to rebuild an array, but they have no spare SATA slots to add the new drive. Get an add-in card that adds extra SATA slots, or a USB disk cradle - preferably USB3, but USB2 will do. You can get a usb case instead, but a cradle that leaves the drive exposed is going to be a lot easier if you have to switch several drives.

If you're forced to rebuild an array, you want as much of the original array in place as possible. If a drive has failed completely, then there's no problem just taking it out and sticking a new drive in, but if the array is in trouble, then you want to --replace a drive if possible, and you can't do that if you don't have a spare slot to add the drive.

Is your array still redundant?

You are running a three-disk mirror, or RAID6, I trust. And have a spare drive configured to take over when one fails?

This is the ideal scenario. When one drive fails, the RAID will seamlessly replace it with a spare, and life will carry on without the user noticing anything is wrong. Most of us don't have the hardware to support all that. Or, as has been told to the author on several occasions, the admins/operators were not monitoring the array and did not realise anything had gone wrong until (almost) too late.

If you are running an array you need to monitor it. Failed drives must be removed and replaced as soon as possible. If your array is still redundant, then just remove the failed device and replace it.

mdadm /dev/mdX [--fail /dev/sdx1] --remove /dev/sdx1 --add /dev/sdy1

So you have no redundancy!

Remember, RAID is not a backup! If you lose redundancy, you need to take a backup! The act of trying to recover an array is often enough to trip another drive over the edge an cause the entire array to fail.

ddrescue or rebuild

Back to Timeout Mismatch Forward to Reconstruction
Personal tools