A guide to mdadm

From Linux Raid Wiki
Revision as of 18:49, 22 November 2016 by Anthony Youngman (Talk | contribs)

Jump to: navigation, search
Back to Converting an existing system Forward to Monitoring your system

This page is an overview of mdadm. It is NOT intended as a replacement for the man pages - anything covered in detail there will be skimmed over here. This is meant to provide examples that you can adapt for yourselves.

Contents

Overview

mdadm has replaced all the previous tools for managing raid arrays. It manages nearly all the user space side of raid. There are a few things that need to be done by writing to the /proc filesystem, but not much.

Getting mdadm

This is a pretty standard part of any distro, so you should use your standard distro software management tool. If, however, you are having any problems it does help to be running the absolute latest version, which can be downloaded with

git clone git://neil.brown.name/mdadm

In the absence of any other preferences, it belongs in the /usr/local/src directory. As a linux-specific program there is none of this autoconf stuff - just follow the instructions as per the INSTALL file.

Modes

mdadm has seven modes. You will normally only use a few of them. They are as follows:-

Assemble

This is probably the mode that is used most, but you won't be using it much - it happens in the background. Every time the system is booted, this needs to run. It scans the drives, looking for superblocks, and rebuilds all the arrays for you. This is why you need an initramfs when booting off a raid array - because mdadm is a user-space program, if root is on an array then we have a catch-22 - we can't boot until we have root, and we can't have root until we've booted and can run mdadm.

Create

This is the first of the two modes you will use a lot. As the name implies, it creates arrays, and writes the superblocks for arrays that have them. It also fires off initialisation - making sure that the disks of a mirror are identical, or that on a parity array the parities are correct. This is why raids 5&6 are created in degraded mode - if they weren't then any check of the raid would spew errors for areas that hadn't been written to.

Grow

A bit of a misnomer, this mode takes care of all operations that changed the size of an array, such as changing the raid level, changing the number of active devices, etc.

Manage

Almost the same as "grow", this takes care of managing arrays, rather than managing one single array. It's used to add spares (which can be shared across multiple arrays), and as a matter of symmetry it is also used to remove (failed) devices.

Follow or Monitor

Build

This is a relic of when superblocks didn't exist. It is used to (re)create an array, and should not be used unless you know exactly what you are doing. Because there are no superblocks, or indeed, any array metadata, the system just has to assume you got everything right because it has no way of checking up on you.

[TODO: Can you use this mode to create a temporary mirror, for the purpose of backing up your live data?]

 Misc

This contains all the bits that don't really fit anywhere else.

Array internals and how it affects mdadm

The first arrays did not have a superblock, and were declared in mdadm.conf. This obviously could lead to a disaster if drives were moved between machines, as they were identified by their partition type. Adding a new drive could move the device names around and, relying on mdadm.conf, when the array was assembled it could get the wrong drives in the wrong place.

Presumably to fix this, the 0.9 version superblock was defined, stored at the end of the device. It's also referred to as the version 0 superblock, 0 referring to the internals of the superblock. However, it lacks support for most of the modern features of mdadm, and is now obsolete. It is also ambiguous, leading to sysadmin confusion, never a good idea.

To fix this, a new version of the superblock was defined, version 1. The layout is common across all subversions, 1.0, 1.1 and 1.2.

Version 1.0 is also stored at the end of the device. This means that 0.9 can be upgraded to 1.0. It also means that, now that raid assembly is no longer supported in the kernel, the only supported way to boot from raid without an initramfs is to use a v1.0 mirror.

Version 1.1 is stored at the start of the device. This is not the best of places, as a wayward fdisk (or other programs) sometimes writes to the start of a disk and could destroy the superblock.

Version 1.2 is stored 4K from the start of the device.

Both 1.1 and 1.2 use the same algorithms to calculate the spare space left at the start of the device.

mdadm is unable to move the superblock, so there is no way of converting between the different version 1s.

There are also two other superblock formats, ddf and imsm. These are "industry standard", not linux specific, and aren't being covered in the 2016 rewrite.

These superblocks also define a "data offset". This is the gap between the start of the device, and the start of the data. This means that v1.2 will always have 4K at least per device. This space can be used for all sorts of things, typically the write-intent bitmap, the bad blocks log, and the backup area when reshaping an array. It is usually calculated automatically, but can be over-ridden.

When you are changing an array, the use of version 1.1 and 1.2 arrays has an important advantage - there is spare space which mdadm will use to protect your data. If you're adding a drive, this also provides spare space. This means that the --backup-file option is not required.

Cookbook

Assembling your arrays

mdadm --assemble --scan

This is the command that runs in the background at boot, and assembles and runs all your arrays (unless something goes wrong, when you usually end up with a partially assembled array. This can be a right pain if you don't realise that's what's happened).

Creating an array

Creating a mirror raid

The simplest example of creating an array, is creating a mirror.

mdadm --create /dev/md/name --add /dev/sda1 /dev/sdb1 --level=1 --raid-devices=2

This will copy the contents of sda1 to sdb1 and give you a clean array. There is no reason why you can't use the array while it is copying (resyncing). This can be suppressed with the "--assume-clean" option, but you should only do this if you know the partitions have been wiped to null beforehand. Otherwise, the dead space will not be a mirror, and any check command will moan blue murder.

Creating a parity raid

Now let's create a more complicated example.

mdadm --create /dev/md/name --add /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 --level=5 --raid-devices=3 --bitmap=internal

This, unsurprisingly, creates a raid 5 array. We've given it four drives, but told it to only use 3. So two of the drives will be assembled into a degraded array, a third drive will be resync'd to fix the parity, and the fourth drive will be marked spare. An internal bitmap has been declared, so the superblock will keep track of which blocks have been updated and which blocks need to be updated. This means that if a drive gets kicked, for some reason, it can be re-added back without needing a total resync.

A bitmap will be created by default if the array is over 100GB in size. Note that this is a fairly recent change, and if you are running on an old kernel you may have to delete the bitmap if you wish to use many of the "grow" options.

The raid by default will be created in degraded mode and will resync. This is because, unless all your drives are blank (just like creating a mirror) any integrity check will moan blue murder that the unused parts of your array contain garbage and the parity is wrong.

Growing an array

BACK UP. BACK UP !! BACK UP !!!!

You should not lose data - mdadm is designed to fail safe, and even when things go completely pear-shaped, the array should still assemble and run, letting you recover if the worst comes to the worst.

Note also that, if you do not have a modern kernel, these commands may fail with an error "Bitmap must be removed before size/shape/level can be changed".

Adding a drive to a mirror

This will add a new drive to your mirror. The "--raid-devices" is optional, if you increase the number of raid devices, the new drive will become an active part of the array and the existing drives will mirror across. If you don't increase the number of raid devices, the new drive will be a spare, and will only become part of the active array one of the other drives fails.

mdadm --grow /dev/md/mirror --add /dev/sdc1 [--raid-devices=3]

Upgrading a mirror raid to a parity raid

The following commands will convert a two-disk mirror into a degraded two-disk raid5 (?Actually a raid4 with missing parity?), and then add the third disk for a fully functional raid5 array. Note that the first command will fail if run on the array we've just grown in the previous section - you cannot change level on anything other than a two-disk mirror.

mdadm --grow /dev/md/mirror --level=5
mdadm --grow /dev/md/mirror --add /dev/sdc1 --raid-devices=3

[TODO: make sure the commands are correct!]

Removing a disk from an array

This will convert the mirror from the first section into a degraded three-disk mirror, and then into a healthy two-disk mirror. Note that using OpenSUSE Leap 42 I had problems reducing the device count to 2.

mdadm --grow /dev/md/mirror --fail /dev/sdc1 --remove /dev/sdc1
mdadm --grow /dev/md/mirror --raid-devices=2

[TODO: make sure the commands are correct!]

Managing an array

Back to Converting an existing system Forward to Monitoring your system
Personal tools