A guide to mdadm
|Back to Converting an existing system Forward to Monitoring your system|
This page is an overview of mdadm. It is NOT intended as a replacement for the man pages - anything covered in detail there will be skimmed over here. This is meant to provide examples that you can adapt for yourselves.
mdadm has replaced all the previous tools for managing raid arrays. It manages nearly all the user space side of raid. There are a few things that need to be done by writing to the /proc filesystem, but not much.
This is a pretty standard part of any distro, so you should use your standard distro software management tool. If, however, you are having any problems it does help to be running the absolute latest version, which can be downloaded with
In the absence of any other preferences, it belongs in the /usr/local/src directory. As a linux-specific program there is none of this autoconf stuff - just follow the instructions as per the INSTALL file.
mdadm has seven modes. You will normally only use a few of them. They are as follows:-
This is probably the mode that is used most, but you won't be using it much - it happens in the background. Every time the system is booted, this needs to run. It scans the drives, looking for superblocks, and rebuilds all the arrays for you. This is why you need an initramfs when booting off a raid array - because mdadm is a user-space program, if root is on an array then we have a catch-22 - we can't boot until we have root, and we can't have root until we've booted and can run mdadm.
This is the first of the two modes you will use a lot. As the name implies, it creates arrays, and writes the superblocks for arrays that have them. It also fires off initialisation - making sure that the disks of a mirror are identical, or that on a parity array the parities are correct. This is why raids 5&6 are created in degraded mode - if they weren't then any check of the raid would spew errors for areas that hadn't been written to.
A bit of a misnomer, this mode takes care of all operations that changed the size of an array, such as changing the raid level, changing the number of active devices, etc.
Almost the same as "grow", this takes care of managing arrays, rather than managing one single array. It's used to add spares (which can be shared across multiple arrays), and as a matter of symmetry it is also used to remove (failed) devices.
Follow or Monitor
This is a relic of when superblocks didn't exist. It is used to (re)create an array, and should not be used unless you know exactly what you are doing. Because there are no superblocks, or indeed, any array metadata, the system just has to assume you got everything right because it has no way of checking up on you.
[TODO: Can you use this mode to create a temporary mirror, for the purpose of backing up your live data?]
This contains all the bits that don't really fit anywhere else.
Array internals and how it affects mdadm
The first arrays did not have a superblock, and were declared in mdadm.conf. This obviously could lead to a disaster if drives were moved between machines, as they were identified by their partition type. Adding a new drive could move the device names around and, relying on mdadm.conf, when the array was assembled it could get the wrong drives in the wrong place.
Presumably to fix this, the 0.9 version superblock was defined, stored at the end of the device. It's also referred to as the version 0 superblock, 0 referring to the internals of the superblock. However, it lacks support for most of the modern features of mdadm, and is now obsolete. It is also ambiguous, leading to sysadmin confusion, never a good idea.
To fix this, a new version of the superblock was defined, version 1. The layout is common across all subversions, 1.0, 1.1 and 1.2.
Version 1.0 is also stored at the end of the device. This means that 0.9 can be upgraded to 1.0. It also means that, now that raid assembly is no longer supported in the kernel, the only supported way to boot from raid without an initramfs is to use a v1.0 mirror.
Version 1.1 is stored at the start of the device. This is not the best of places, as a wayward fdisk (or other programs) sometimes writes to the start of a disk and could destroy the superblock.
Version 1.2 is stored 4K from the start of the device.
Both 1.1 and 1.2 use the same algorithms to calculate the spare space left at the start of the device. Version 1.0 will leave 128K at the end of each device if it is large enough - "large enough" being defined as 200GB.
mdadm is unable to move the superblock, so there is no way of converting between the different version 1s.
There are also two other superblock formats, ddf and imsm. These are "industry standard", not linux specific, and aren't being covered in the 2016 rewrite.
These superblocks also define a "data offset". This is the gap between the start of the device, and the start of the data. This means that v1.2 will always have 4K at least per device. This space can be used for all sorts of things, typically the write-intent bitmap, the bad blocks log, and the backup area when reshaping an array. It is usually calculated automatically, but can be over-ridden.
All operations that involve moving data around are called reshapes, and require some form of backup. Typically, the new layout will overwrite the old layout so the data is in danger, and the whole point of raid is to protect against that danger. About the only operation that does not need a backup is changing the data offset. This is the big advantage of the new v1 superblock - it enables a reshape operation to use spare space in the array to store the backup, without needing the --backup-file option.
When reshaping an array, mdadm will lock the stripe being updated, back it up, and rewrite it. Any reads or writes to the array will see if they are above or below the stripe being reshaped, and use the appropriate algorithm. If they hit the stripe being rewritten, they have to wait. When adding a device, the reshape will start at the beginning of the array, because after the first few stripes a gap will open up on disk and the new stripe will not overwrite the old one. If removing a device, it will start at the top and move down because this starts with a gap, which shrinks until only at the very end it starts overwriting the strip being rewritten.
Assembling your arrays
mdadm --assemble --scan
This is the command that runs in the background at boot, and assembles and runs all your arrays (unless something goes wrong, when you usually end up with a partially assembled array. This can be a right pain if you don't realise that's what's happened).
Creating an array
Creating a mirror raid
The simplest example of creating an array, is creating a mirror.
mdadm --create /dev/md/name /dev/sda1 /dev/sdb1 --level=1 --raid-devices=2
This will copy the contents of sda1 to sdb1 and give you a clean array. There is no reason why you can't use the array while it is copying (resyncing). This can be suppressed with the "--assume-clean" option, but you should only do this if you know the partitions have been wiped to null beforehand. Otherwise, the dead space will not be a mirror, and any check command will moan blue murder.
Creating a parity raid
Now let's create a more complicated example.
mdadm --create /dev/md/name /dev/sda1 /dev/sdb1 /dev/sdc1 --level=5 --raid-devices=3 --bitmap=internal
This, unsurprisingly, creates a raid 5 array. When creating the array, you must give it exactly the the number of devices it expects, ie 3 here. So two of the drives will be assembled into a degraded array, and the third drive will be resync'd to fix the parity. If you want to add a fourth drive as spare, this must be done later. An internal bitmap has been declared, so the superblock will keep track of which blocks have been updated and which blocks need to be updated. This means that if a drive gets kicked, for some reason, it can be re-added back without needing a total resync.
A bitmap will be created by default if the array is over 100GB in size. Note that this is a fairly recent change, and if you are running on an old kernel you may have to delete the bitmap if you wish to use many of the "grow" options.
The raid by default will be created in degraded mode and will resync. This is because, unless all your drives are blank (just like creating a mirror) any integrity check will moan blue murder that the unused parts of your array contain garbage and the parity is wrong.
Growing an array
BACK UP. BACK UP !! BACK UP !!!!
You should not lose data - mdadm is designed to fail safe, and even when things go completely pear-shaped, the array should still assemble and run, letting you recover if the worst comes to the worst.
Note also that, if you do not have a modern kernel, these commands may fail with an error "Bitmap must be removed before size/shape/level can be changed".
Adding a drive to a mirror
This will add a new drive to your mirror. The "--grow / --raid-devices" is optional, if you increase the number of raid devices, the new drive will become an active part of the array and the existing drives will mirror across. If you don't increase the number of raid devices, the new drive will be a spare, and will only become part of the active array one of the other drives fails.
mdadm [--grow] /dev/md/mirror --add /dev/sdc1 [--raid-devices=3]
Upgrading a mirror raid to a parity raid
The following commands will convert a two-disk mirror into a degraded two-disk raid5, and then add the third disk for a fully functional raid5 array. Note that the first command will fail if run on the array we've just grown in the previous section if you changed the number of raid-devices to three. If you have two devices and a spare the first command on its own will work. I haven't investigated whether more than two active devices plus a spare will work.
mdadm --grow /dev/md/mirror --level=5 mdadm --grow /dev/md/mirror --add /dev/sdc1 --raid-devices=3
Removing a disk from an array
This will convert the mirror from the first section into a degraded three-disk mirror, and then into a healthy two-disk mirror. Note that using OpenSUSE Leap 42 I had problems reducing the device count to 2.
mdadm /dev/md/mirror --fail /dev/sdc1 --remove /dev/sdc1 mdadm --grow /dev/md/mirror --raid-devices=2
[TODO: make sure the commands are correct!]
Adding more space without adding another device
You might not have sufficient SATA ports to add any more disk drives, but you want to add more space to your array. The procedure is pretty much the same as for replacing a dodgy device.
If you have a three-disk mirror, you can just fail a drive and replace it with a larger drive, and repeat until all your drives have been replaced with bigger drives. If you've only got a two-drive mirror, the earlier advice to get an add-in SATA PCI card or somesuch applies. Either add the new drive then fail the old one, or put the old one in a USB cage on a temporary basis and --replace it.
If you have a parity raid, then if you have a raid-6, again you can just fail a drive and then add in a new one, but this is not the best idea. If you don't want to place your data at risk, then for a raid-5 you *must*, and for a raid-6 you *should*, use a spare SATA port to do a --replace, or swap your old drive into a USB cage so you can do a --replace that way.
Once you've increased the size of all the underlying devices, you can increase the size of the array, and then increase the size of the filesystem or partitions in the array.
Managing an array
Changing the size of the array
Adding a new device to a mirror won't change the size of the array - the new device will store an extra copy of the data and there won't be any extra space. But adding a device to a parity array will normally increase the space available. Assuming we have a 3 by 4TB raid-5, that gives us an 8TB array. Adding a fourth 4TB will give us a 12TB array.
Assuming it's possible, adding a 3TB device would only give us an extra 1TB, as the formula for raid-5 is devices-1 times size-of-smallest-device. This is, however, unlikely to be possible. To do so, you would probably have to reduce the the size of the array to 6TB (--size=3TB) before adding the new drive (which would then increase capacity to 9TB).
Adding that fourth drive has increased the size of the array, which we can take advantage of by resizing the file system on it (see the next section). But what happens if we've added a 5TB drive to our three 4TB drives? Unfortunately, that extra 1TB is wasted because this drive is larger. So let's assume that rather than adding a fourth - 5TB - drive, we've actually replaced all 4TB drives to give us a 3 x 5TB drive. Unfortunately, we've still only got an 8TB array. We need to tell mdadm about the extra space available with
mdadm --grow /dev/mdN --size=max
The --size option tells the array how much of each disk to use. When an array is created it defaults to the size of the smallest drive but when, as here, we have replaced all the drives with larger drives we need to explicitly tell mdadm about the space. The "max" argument tells it, once again, to default to the size of the smallest drive currently in the array.
Note that there has been the odd report of --size=max not working. Make sure the kernel knows about the new disk size, as this was the problem - you may have to re-hot-add the drive or reboot the computer.
Using the new space
If the array was originally partitioned, the new space will now be available to resize existing partitions or add new ones. If the array had a filesystem on it, the filesystem can now be expanded. Different filesystems have different commands, for example
|Back to Converting an existing system Forward to Monitoring your system|