What do you want in your stack?

Revision as of 10:18, 3 December 2017

Because Linux is modular, a lot of functionality is duplicated by various components. For example, modern filesystems try and be all things to all men, while other components such as Device Manager and md-raid are more single-function components.

Which component provides what
Component	Combine Devices	Grow and Shrink	Redundancy	Integrity Check	Snapshots
Raid	Yes	Yes	Yes	Not default	No
LVM	Yes	Yes	No	No	Yes
Btrfs	Yes	Grow only?	Yes	Yes	Yes

Why not btrfs (or another modern filesystem)

Looking at the table, and seeing that btrfs provides all four capabilities, you might wonder why you want to bother with raid and lvm - why not just use btrfs, or another modern filesystem. The reality is that not all filesystems provide all the features you may want, nor will they necessarily provide them in the way you want.

Take btrfs for example. You may want to have several mount points, so that filling up say /home doesn't cause your mail system on /var to fall over. Btrfs provides redundancy - but only raid-1 is reliable at present. That said, btrfs applies redundancy at a far finer-grained level than can be done by lvm or md-raid lying underneath the file system. The current strengths of btrfs are perceived to be snapshotting and integrity checking.

The "obvious" stack

Apologies if you disagree with my definition of obvious.

These layers can be stacked in pretty much any order (apart from the physical device at the bottom, and the filesystem at the top), because as far as Linux is concerned they are all just block devices. It is not a good idea to be too clever, as the more layers you have, the harder it becomes to keep track of what is happening, and that will lead to possibly disastrous mistakes. KISS. And document!

The physical layer

At the bottom of the stack you will have the real devices that underlie the operating system, such as a real hard drive, or network attached storage, or similar. Make sure you understand what you have here - your redundancy relies on information being duplicated across different physical drives. If you are using network attached storage, it's even better if your data is duplicated across different NAS boxes or computers.

If you create a mirror from devices sda1 and sda2, it's pretty obvious that a single hard drive failure is going to take out both partitions and lose your data. Creating your mirror from sda and sdb will protect against a single hard drive failing, but what if a power surge takes out the computer and damages both drives at the same time? Using NAS will protect against this.

But then you have to look at your network - if you are mirroring over the network then you need to ensure that the network is fast enough and your computer is configured with sufficient cache so that the disk subsystem does not grind to a halt waiting for the network to respond. If you are using parity raid over the network, then speed becomes even more important, because if the network goes slow, the raid will hang waiting on it. You also need to understand the physical structure at the other end of the network cable. Is the NAS using one physical device to provide both of your network devices you are mirroring on?

(That being said, there are very good reasons for using multiple block devices eg sda1 and sda2 on the same physical device, but this should only be for data recovery purposes. Never do it as part of a planned data layout, only as a temporary measure to get out of a hole.)

LUKS

This belongs best either here or just after the partition layer. I don't know anything much about it, as it encrypts your partition/disk and I do not use it.

Its great advantage is that by deleting the key, you instantly turn the disk or partition into random garbage, great for secure deletion.

Its great disadvantage is that by losing the key, you instantly turn the disk or partition into random garbage. Without a backup, you've had it,

The disk partition layer (MBR / GPT)

If you are booting from the disk, then you need a partition layer. Grub (1 or 2), Lilo (1 or 2), EFI, and Windows, all require it.

If you want multiple md-raid arrays on the same physical device, then you need it to subdivide the disk. (You could put LVM on the disk then subdivide that.)

Because these all reserve the first megabyte (2048 x 512Byte sectors) for themselves, it is very common to always create a full-disk partition rather than use bare disks, not least because accidentally creating a partition table on a raid device can easily destroy the raid.

The md-raid layer

The purpose of raid (as implied by the name) is to combine multiple disks into one, configured to survive a disk failure.

It has been extended to combine multiple devices of different sizes to maximise storage, for which you will want to use raid-0 which comes in two flavours. Linear fills up one device before going on to the next, and the next, and the next ... Striped writes a chunk to the first device, then the next, then the next until it runs out of devices and returns to the first. Both add all the device capacities together. And both are very vulnerable to data loss. If this is what you want, you are better off using a modern filesystem which understands multiple devices and won't lose everything just because one disk failed.

If however you want redundancy to protect against a disk failure, and optionally to protect against corruption, then raid is what you want. Raids 1 and 10 provide mirroring - storing multiple copies of data to protect against a disk failure. Raid 5 provides parity - for each stripe it uses one disk to store parity and the remaining disks to store the data. So if a disk is lost it can be recreated. Raid 6 is similar to raid 5 but uses two disks to store parity so if two disks are lost they can be recreated, or if the array suffers random corruption then provided only one block per stripe is corrupted, it can be found and recreated.

Raid also allows you to grow or shrink the array, or change levels. Md-raid for the most part has no problem adding and removing disks, and as raids 1, 10, 4, 5 and 6 all have modes where the disk layout is identical across two or more raid levels (for example, a two-disk raid-1 is identical to a two-disk raid-4 is identical to a two-disk degraded raid-5), it can usually also change easily between the various levels. Only far raid 10 and raid-6 are problematic - far 10 cannot be reshaped, and because raid-6 uses a different algorithm to the others it needs to be converted to ?raid-4? before it can be converted to anything else.

The array partition layer (MBR / GPT)

Once you've got your array, you can now partition this as if it were a physical drive. However, once you've got this far, you would be far better using LVM. Partitioning at this point offers no real benefits over splitting the drives up and having multiple arrays over the partitions.

The Logical Volume Manager (LVM)

Think of LVM as fdisk on steroids. Both divide up a device into partitions. But whereas fdisk is relatively simple - you give it a device and tell it where you want the partition to start and end, LVM has many more tricks up its sleeve. Once you've told fdisk where on the device to put the partition, that's it. Everything is fixed.

Whereas with LVM you can combine multiple devices into one (a bit like raid-0). Rather than telling LVM where to put the partition, you tell it how much space you want, and it figures out where it will fit in the volume. You can even tell it to move a partition from one volume to another - a bit like moving a partition from one disk to another.

You can use it to take snapshots - it will freeze the current partition and make it Copy On Write (COW) so new stuff gets written elsewhere and you can go back to the frozen version.

And you can grow or shrink your partitions as required if you need the space.

@@ Line 7: / Line 7: @@
 ! Component
 ! Combine Devices
-| Grow and Shrink
+! Grow and Shrink
 ! Redundancy
 ! Integrity Check
@@ Line 21: / Line 21: @@
 | LVM
 | Yes
-| No
 | Yes
+| No
 | No
 | Yes
 |-
 | Btrfs
-| Yes
 | Yes
 | Grow only?
+| Yes
 | Yes
 | Yes
@@ Line 55: / Line 55: @@
 (That being said, there are very good reasons for using multiple block devices eg sda1 and sda2 on the same physical device, but this should only be for data recovery purposes. Never do it as part of a planned data layout, only as a temporary measure to get out of a hole.)
+=== LUKS ===
+This belongs best either here or just after the partition layer. I don't know anything much about it, as it encrypts your partition/disk and I do not use it.
+Its great advantage is that by deleting the key, you instantly turn the disk or partition into random garbage, great for secure deletion.
+Its great disadvantage is that by losing the key, you instantly turn the disk or partition into random garbage. Without a backup, you've had it,
 === The disk partition layer (MBR / GPT) ===
@@ Line 63: / Line 71: @@
 Because these all reserve the first megabyte (2048 x 512Byte sectors) for themselves, it is very common to always create a full-disk partition rather than use bare disks, not least because accidentally creating a partition table on a raid device can easily destroy the raid.
+=== The md-raid layer ===
+The purpose of raid (as implied by the name) is to combine multiple disks into one, configured to survive a disk failure.
+It has been extended to combine multiple devices of different sizes to maximise storage, for which you will want to use raid-0 which comes in two flavours. Linear fills up one device before going on to the next, and the next, and the next ... Striped writes a chunk to the first device, then the next, then the next until it runs out of devices and returns to the first. Both add all the device capacities together. And both are very vulnerable to data loss. If this is what you want, you are better off using a modern filesystem which understands multiple devices and won't lose everything just because one disk failed.
+If however you want redundancy to protect against a disk failure, and optionally to protect against corruption, then raid is what you want. Raids 1 and 10 provide mirroring - storing multiple copies of data to protect against a disk failure. Raid 5 provides parity - for each stripe it uses one disk to store parity and the remaining disks to store the data. So if a disk is lost it can be recreated. Raid 6 is similar to raid 5 but uses two disks to store parity so if two disks are lost they can be recreated, or if the array suffers random corruption then provided only one block per stripe is corrupted, it can be found and recreated.
+Raid also allows you to grow or shrink the array, or change levels. Md-raid for the most part has no problem adding and removing disks, and as raids 1, 10, 4, 5 and 6 all have modes where the disk layout is identical across two or more raid levels (for example, a two-disk raid-1 is identical to a two-disk raid-4 is identical to a two-disk degraded raid-5), it can usually also change easily between the various levels. Only far raid 10 and raid-6 are problematic - far 10 cannot be reshaped, and because raid-6 uses a different algorithm to the others it needs to be converted to ?raid-4? before it can be converted to anything else.
+=== The array partition layer (MBR / GPT) ===
+Once you've got your array, you can now partition this as if it were a physical drive. However, once you've got this far, you would be far better using LVM. Partitioning at this point offers no real benefits over splitting the drives up and having multiple arrays over the partitions.
+=== The Logical Volume Manager (LVM) ===
+Think of LVM as fdisk on steroids. Both divide up a device into partitions. But whereas fdisk is relatively simple - you give it a device and tell it where you want the partition to start and end, LVM has many more tricks up its sleeve. Once you've told fdisk where on the device to put the partition, that's it. Everything is fixed.
+Whereas with LVM you can combine multiple devices into one (a bit like raid-0). Rather than telling LVM where to put the partition, you tell it how much space you want, and it figures out where it will fit in the volume. You can even tell it to move a partition from one volume to another - a bit like moving a partition from one disk to another.
+You can use it to take snapshots - it will freeze the current partition and make it Copy On Write (COW) so new stuff gets written elsewhere and you can go back to the frozen version.
+And you can grow or shrink your partitions as required if you need the space.

What do you want in your stack?

Revision as of 10:18, 3 December 2017

Contents

Why not btrfs (or another modern filesystem)

The "obvious" stack

The physical layer

LUKS

The disk partition layer (MBR / GPT)

The md-raid layer

The array partition layer (MBR / GPT)

The Logical Volume Manager (LVM)

Views

Personal tools

Navigation

Search

Tools