Linux Raid

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
m (Overview)
(61 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This site is the Linux-raid kernel list community-managed reference for Linux software RAID as implemented in recent version 3 series and 2.6 kernels.
+
== Introduction ==
 +
 
 +
This site is the Linux-raid kernel list community-managed reference for Linux software RAID as implemented in recent version 4 kernels and earlier.
 
It should replace many of the unmaintained and out-of-date documents ''out there'' such as the '''Software RAID HOWTO''' and the '''Linux RAID FAQ'''.
 
It should replace many of the unmaintained and out-of-date documents ''out there'' such as the '''Software RAID HOWTO''' and the '''Linux RAID FAQ'''.
  
 
Where possible, information should be tagged with the minimum kernel/software version required to use the feature. Some of the information on these pages are unfortunately quite old, but we are in the process of updating the info (aren't we always...)
 
Where possible, information should be tagged with the minimum kernel/software version required to use the feature. Some of the information on these pages are unfortunately quite old, but we are in the process of updating the info (aren't we always...)
 +
 +
== Mailing list ==
  
 
Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid
 
Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid
  
==Help wanted==
+
This follows kernel.org conventions. You should use "reply to all" unless explicitly requested. Extraneous material should be trimmed. Replies should be in-line or at the bottom.
Please contact [[David Greaves]] or [[Nick Yeates]] if you'd like to help with this site.
+
  
==Overview==
+
And please use an email client that threads correctly!
There is an [[Overview]] section that is based on the RAID HowTo, covering the following:
+
* [[Why RAID?]]
+
* [[Devices]]
+
* [[Hardware issues]]
+
* [[RAID setup]]
+
* [[Detecting, querying and testing]]
+
* [[Tweaking, tuning and troubleshooting]]
+
* [[Reconstruction]]
+
* [[Recovering a failed software RAID]]
+
* [[Growing]]
+
* [[Performance]]
+
* [[Related tools]]
+
* [[Partitioning RAID / LVM on RAID]]
+
  
The document is sprinkled with references to the deprecated raidtools which are being gradually removed.
+
== Help wanted ==
 +
This site was created by [[David Greaves]] and [[Nick Yeates]]. But life moved on and having tried to provide up-to-date info, the info became out of date again. [[Keld Simonsen]] updated a lot of the information, and made good ratings for Google.
  
==Frequently Asked Questions - FAQ==
+
As of September 2016 [[User:Anthony Youngman|Wol]] is updating it to mdadm 3.3 and the 4.x kernels (mdadm 4.0 was released in January 2017). Please contact Wol, Keld or Nick if you want to help. Please read the [[editing guidelines]].
Here goes a collection of frequently asked questions.
+
  
A [[mdadm-faq]] is available.
+
Where a page has been partially updated, but the updater lacks the knowledge to update all of it, please mark the old sections with "(2011)" in the section header to indicate it is old information.
  
==Areas Of Interest==
+
== Overview ==
 +
 
 +
* [[What is RAID and why should you want it?]]
 +
* [[Choosing your hardware, and what is a device?]]
 +
* [[What do you want in your stack?]]
 +
* [[RAID and filesystems]]
 +
* [[Setting up a (new) system]]
 +
* [[Converting an existing system]]
 +
* [[A guide to mdadm]]
 +
* [[Scrubbing the drives]]
 +
* [[Monitoring your system]]
 +
 
 +
[TODO: discuss layering things on top of raid, ie partitioning an array, LVM, or a btrfs filesystem]
 +
 
 +
The 2016 rewrite is not covering LVM (at the moment) so for LVM you will find all the old stuff in the archaeology section. Also all the performance data is 2011 vintage, so that has been relegated to the archaeology section too.
 +
 
 +
== When Things Go Wrogn ==
 +
 
 +
Don't panic, Mister Mainwaring!
 +
 
 +
RAID is very good at protecting your data. In fact, NEARLY ALL data lost as reported to the raid mailing list, is down to user error while attempting to recover a failed array.
 +
 
 +
In particular NEVER NEVER NEVER use "mdadm --create" on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem - it may not be quite as effective as "dd if=/dev/random of=/dev/sda", but it's pretty close ...
 +
 
 +
The simplest things are sometimes the best. If an array fails to start after a crash or reboot and you can't get it to assemble, always try an "mdadm /dev/mdN --stop", and then try to assemble it again. Problems at boot often leave you with a partially assembled array that then refuses to do anything. A "stop" followed by an "assemble" can never do any harm, and may well fix the problem. Be very careful with "--force", though, as it may trigger a resync which could destroy the contents of a drive and make recovery difficult or impossible.
 +
 
 +
* [[Asking for help]]
 +
* [[Timeout Mismatch|Timeout Mismatch (Why you shouldn't use desktop drives)]]
 +
* [[Easy Fixes|Easy Fixes (Scary problems that tend to resolve themselves)]]
 +
* [[Replacing a failed drive]]
 +
* [[Assemble Run|My array won't assemble / run]]
 +
* [[Recovering a damaged RAID]]
 +
* [[Advanced data recovery|mdadm says my array doesn't exist! (WIP)]]
 +
 
 +
In addition to reading this, it is probably worth while going to the software archaeology section and reading "RAID Recovery" and "Recovering a failed software RAID". Just be aware these are old pages, and things may have changed. And that everything that is relevant in 2016 should have been copied into the above pages.
 +
 
 +
== Areas Of Interest ==
  
 
* [[RAID Creation]]
 
* [[RAID Creation]]
* [[RAID Recovery]]
+
* [[Drive Data Sheets]]
* [[RAID Administration]]
+
* [[RAID Boot]]
+
* [[SATA RAID Boot Recipe]]
+
* [[Preventing against a failing disk]]
+
  
==Hardware RAID==
+
== Hardware RAID ==
 
Proper hardware RAID systems are presented to linux as a block device and there's no coverage of them (yet) in this wiki.
 
Proper hardware RAID systems are presented to linux as a block device and there's no coverage of them (yet) in this wiki.
 +
 +
* [[Hardware Raid Setup using MegaCli]]
  
 
BIOS / firmware RAID aka [https://ata.wiki.kernel.org/index.php/SATA_RAID_FAQ fake raid cards]:
 
BIOS / firmware RAID aka [https://ata.wiki.kernel.org/index.php/SATA_RAID_FAQ fake raid cards]:
Line 53: Line 77:
 
Given the point of RAID is usually to reduce risk it is fair to say that using fakeraid is a terrible idea and it's better to focus energy on either true HW raid or in-kernel SW raid .... but there is nothing stopping you :)
 
Given the point of RAID is usually to reduce risk it is fair to say that using fakeraid is a terrible idea and it's better to focus energy on either true HW raid or in-kernel SW raid .... but there is nothing stopping you :)
  
==External links==
+
== Kernel Programming ==
 +
 
 +
This section is meant to be the home for a variety of things. With Neil Brown stepping down as maintainer (early 2016), the development process doesn't seem to be quite so "robust". Not a surprise, the new maintainers need to gain the experience Neil had of the subsystem. So this section will house documentation about how the internals of the raid subsystem works.
 +
 
 +
But documentation without specification isn't much use. There was a philosophy (famously espoused by Microsoft, especially in their Office Open XML Specification) that "the code is the documentation" or "the code is the specification". This is great for coders - one of its features is that it eliminates all bugs at a stroke! If the code is the specification, then the system <em>has</em> to behave as specified. So this section will also house documentation about how the internals of the raid subsystem are supposed to work.
 +
 
 +
Then, of course, we want as many people helping with the system as possible. So this section will also contain a list of projects that people can do, and some advice on help for them on where to start. They needn't be work on the kernel itself, or mdadm, there are utilities already out there (Phil's lsdrv, Brad's timeout script) and there are plenty more that would be appreciated.
 +
 
 +
[[Programming projects]]
 +
 
 +
== Archaeology ==
 +
 
 +
This section is where all the old pages have been moved. Some of them may have been edited before being moved but the information here is mostly out-of-date, such as lilo, raidtools, etc. It may well be of interest to people running old systems, but shouldn't be in the main section where it may confuse people.
 +
 
 +
[[Valley Of The Kings|RAID Archaeology]]
 +
 
 +
== External links ==
 
* [[wikipedia:How to edit a page|Editing pages]]
 
* [[wikipedia:How to edit a page|Editing pages]]
 
* [http://en.wikipedia.org/wiki/RAID Wikipedia RAID] including description of specific Linux RAID types
 
* [http://en.wikipedia.org/wiki/RAID Wikipedia RAID] including description of specific Linux RAID types

Revision as of 20:38, 21 January 2018

Contents

Introduction

This site is the Linux-raid kernel list community-managed reference for Linux software RAID as implemented in recent version 4 kernels and earlier. It should replace many of the unmaintained and out-of-date documents out there such as the Software RAID HOWTO and the Linux RAID FAQ.

Where possible, information should be tagged with the minimum kernel/software version required to use the feature. Some of the information on these pages are unfortunately quite old, but we are in the process of updating the info (aren't we always...)

Mailing list

Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid

This follows kernel.org conventions. You should use "reply to all" unless explicitly requested. Extraneous material should be trimmed. Replies should be in-line or at the bottom.

And please use an email client that threads correctly!

Help wanted

This site was created by David Greaves and Nick Yeates. But life moved on and having tried to provide up-to-date info, the info became out of date again. Keld Simonsen updated a lot of the information, and made good ratings for Google.

As of September 2016 Wol is updating it to mdadm 3.3 and the 4.x kernels (mdadm 4.0 was released in January 2017). Please contact Wol, Keld or Nick if you want to help. Please read the editing guidelines.

Where a page has been partially updated, but the updater lacks the knowledge to update all of it, please mark the old sections with "(2011)" in the section header to indicate it is old information.

Overview

[TODO: discuss layering things on top of raid, ie partitioning an array, LVM, or a btrfs filesystem]

The 2016 rewrite is not covering LVM (at the moment) so for LVM you will find all the old stuff in the archaeology section. Also all the performance data is 2011 vintage, so that has been relegated to the archaeology section too.

When Things Go Wrogn

Don't panic, Mister Mainwaring!

RAID is very good at protecting your data. In fact, NEARLY ALL data lost as reported to the raid mailing list, is down to user error while attempting to recover a failed array.

In particular NEVER NEVER NEVER use "mdadm --create" on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem - it may not be quite as effective as "dd if=/dev/random of=/dev/sda", but it's pretty close ...

The simplest things are sometimes the best. If an array fails to start after a crash or reboot and you can't get it to assemble, always try an "mdadm /dev/mdN --stop", and then try to assemble it again. Problems at boot often leave you with a partially assembled array that then refuses to do anything. A "stop" followed by an "assemble" can never do any harm, and may well fix the problem. Be very careful with "--force", though, as it may trigger a resync which could destroy the contents of a drive and make recovery difficult or impossible.

In addition to reading this, it is probably worth while going to the software archaeology section and reading "RAID Recovery" and "Recovering a failed software RAID". Just be aware these are old pages, and things may have changed. And that everything that is relevant in 2016 should have been copied into the above pages.

Areas Of Interest

Hardware RAID

Proper hardware RAID systems are presented to linux as a block device and there's no coverage of them (yet) in this wiki.

BIOS / firmware RAID aka fake raid cards:

  • offer a few performance benefits (like CPU, bus and RAM offloading), but may often be much slower than SW raid (link?)
  • if the 'raid' card or motherboard dies then you often have to find an exact replacement and this can be tricky for older cards
  • if drives move to other machines the data can't easily be read
  • there is usually no monitoring or reporting on the array - if a problem occurs then it may not show up unless the machine is rebooted *and* someone is actually watching the BIOS boot screen (or until multiple errors occur and your data is lost)
  • you are entrusting your data to unpatchable software written into a BIOS that has probably not been tested, has no support mechanism and almost no community.
  • having seen how many bugs the kernel works around in various BIOSes it would be optimistic to think that the BIOS RAID has no bugs.

Given the point of RAID is usually to reduce risk it is fair to say that using fakeraid is a terrible idea and it's better to focus energy on either true HW raid or in-kernel SW raid .... but there is nothing stopping you :)

Kernel Programming

This section is meant to be the home for a variety of things. With Neil Brown stepping down as maintainer (early 2016), the development process doesn't seem to be quite so "robust". Not a surprise, the new maintainers need to gain the experience Neil had of the subsystem. So this section will house documentation about how the internals of the raid subsystem works.

But documentation without specification isn't much use. There was a philosophy (famously espoused by Microsoft, especially in their Office Open XML Specification) that "the code is the documentation" or "the code is the specification". This is great for coders - one of its features is that it eliminates all bugs at a stroke! If the code is the specification, then the system has to behave as specified. So this section will also house documentation about how the internals of the raid subsystem are supposed to work.

Then, of course, we want as many people helping with the system as possible. So this section will also contain a list of projects that people can do, and some advice on help for them on where to start. They needn't be work on the kernel itself, or mdadm, there are utilities already out there (Phil's lsdrv, Brad's timeout script) and there are plenty more that would be appreciated.

Programming projects

Archaeology

This section is where all the old pages have been moved. Some of them may have been edited before being moved but the information here is mostly out-of-date, such as lilo, raidtools, etc. It may well be of interest to people running old systems, but shouldn't be in the main section where it may confuse people.

RAID Archaeology

External links


See Spam Blocks for the spam restrictions on this site.

Personal tools