Linux Raid

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Undo revision 5745 by Jean-Daniel Dodin (talk) This is a deliberate smelling pistake)
(add note about --stop)
Line 35: Line 35:
  
 
In particular NEVER NEVER NEVER use "mdadm --create" on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem - it may not be quite as effective as "dd if=/dev/random of=/dev/sda", but it's pretty close ...
 
In particular NEVER NEVER NEVER use "mdadm --create" on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem - it may not be quite as effective as "dd if=/dev/random of=/dev/sda", but it's pretty close ...
 +
 +
The simplest things are sometimes the best. If an array fails to start after a crash or reboot and you can't get it to assemble, always try an "mdadm /dev/mdN --stop", and then try to assemble it again. Problems at boot often leave you with a partially assembled array that then refuses to do anything.
  
 
* [[Asking for help]]
 
* [[Asking for help]]

Revision as of 20:52, 9 June 2017

This site is the Linux-raid kernel list community-managed reference for Linux software RAID as implemented in recent version 4 kernels and earlier. It should replace many of the unmaintained and out-of-date documents out there such as the Software RAID HOWTO and the Linux RAID FAQ.

Where possible, information should be tagged with the minimum kernel/software version required to use the feature. Some of the information on these pages are unfortunately quite old, but we are in the process of updating the info (aren't we always...)

Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid

Contents

Help wanted

This site was created by David Greaves and Nick Yeates. But life moved on and having tried to provide up-to-date info, the info became out of date again. Keld Simonsen updated a lot of the information, and made good ratings for Google.

As of September 2016 Wol is updating it to mdadm 3.3 and the 4.x kernels (mdadm 4.0 was released in January 2017). Please contact Wol, Keld or Nick if you want to help. Please read the editing guidelines.

Where a page has been partially updated, but the updater lacks the knowledge to update all of it, please mark the old sections with "(2011)" in the section header to indicate it is old information.

Overview

[TODO: talk about optimising filesystems as per RAID_setup ]

[TODO: discuss layering things on top of raid, ie partitioning an array, LVM, or a btrfs filesystem]

The 2016 rewrite is not covering LVM (at the moment) so for LVM you will find all the old stuff in the archaeology section. Also all the performance data is 2011 vintage, so that has been relegated to the archaeology section too.

When Things Go Wrogn

Don't panic, Mister Mainwaring!

RAID is very good at protecting your data. In fact, NEARLY ALL data lost as reported to the raid mailing list, is down to user error while attempting to recover a failed array.

In particular NEVER NEVER NEVER use "mdadm --create" on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem - it may not be quite as effective as "dd if=/dev/random of=/dev/sda", but it's pretty close ...

The simplest things are sometimes the best. If an array fails to start after a crash or reboot and you can't get it to assemble, always try an "mdadm /dev/mdN --stop", and then try to assemble it again. Problems at boot often leave you with a partially assembled array that then refuses to do anything.

In addition to reading this, it is probably worth while going to the software archaeology section and reading "RAID Recovery" and "Recovering a failed software RAID". Just be aware these are old pages, and things may have changed. And that everything that is relevant in 2016 should have been copied into the above pages.

Areas Of Interest

Hardware RAID

Proper hardware RAID systems are presented to linux as a block device and there's no coverage of them (yet) in this wiki.

BIOS / firmware RAID aka fake raid cards:

  • offer a few performance benefits (like CPU, bus and RAM offloading), but may often be much slower than SW raid (link?)
  • if the 'raid' card or motherboard dies then you often have to find an exact replacement and this can be tricky for older cards
  • if drives move to other machines the data can't easily be read
  • there is usually no monitoring or reporting on the array - if a problem occurs then it may not show up unless the machine is rebooted *and* someone is actually watching the BIOS boot screen (or until multiple errors occur and your data is lost)
  • you are entrusting your data to unpatchable software written into a BIOS that has probably not been tested, has no support mechanism and almost no community.
  • having seen how many bugs the kernel works around in various BIOSes it would be optimistic to think that the BIOS RAID has no bugs.

Given the point of RAID is usually to reduce risk it is fair to say that using fakeraid is a terrible idea and it's better to focus energy on either true HW raid or in-kernel SW raid .... but there is nothing stopping you :)

Kernel Programming

This section is meant to be the home for a variety of things. With Neil Brown stepping down as maintainer (early 2016), the development process doesn't seem to be quite so "robust". Not a surprise, the new maintainers need to gain the experience Neil had of the subsystem. So this section will house documentation about how the internals of the raid subsystem works.

But documentation without specification isn't much use. There was a philosophy (famously espoused by Microsoft, especially in their Office Open XML Specification) that "the code is the documentation" or "the code is the specification". This is great for coders - one of its features is that it eliminates all bugs at a stroke! If the code is the specification, then the system has to behave as specified. So this section will also house documentation about how the internals of the raid subsystem are supposed to work.

Then, of course, we want as many people helping with the system as possible. So this section will also contain a list of projects that people can do, and some advice on help for them on where to start. They needn't be work on the kernel itself, or mdadm, there are utilities already out there (Phil's lsdrv, Brad's timeout script) and there are plenty more that would be appreciated.

Programming projects

Archaeology

This section is where all the old pages have been moved. Some of them may have been edited before being moved but the information here is mostly out-of-date, such as lilo, raidtools, etc. It may well be of interest to people running old systems, but shouldn't be in the main section where it may confuse people.

RAID Archaeology

External links

Mailing list



See Spam Blocks for the spam restrictions on this site.

Personal tools