Superblock

From Linux Raid Wiki
Revision as of 09:51, 5 October 2013 by Gabriel (Talk | contribs)

Jump to: navigation, search

Linux raid reserves a bit of space (called a superblock) on each component device. This space holds metadata about the RAID device and allows correct assembly of the array.

There are several versions of superblocks but they can be split into 3 groups:

  • ancient (pre-0.9)
  • 0.9
  • 1.0 to 1.2

The on-disk superblock formats info can be found here: RAID_superblock_formats

The ancient superblocks are out of scope for this wiki and aren't used by mdadm and md.

1.x superblocks

1.x superblocks are new(ish)

The version numbers simply indicate where the superblock is stored on the individual component devices.

Version 1.0 is stored near the end of the device (at least 8K, and less than 12K, from the end). This is useful, especially with RAID-0 and RAID-10 devices, because the RAID filesystem and non-RAID filesystem starts in exactly the same place, so if you can't get RAID up you can often mount the thing directly in readonly mode and get the data off. But it has the tiny risk that writing off the end of the filesystem (when it's mounted without going through md like that) will wreck your array.

Version 1.1 is stored at the start of the device. This eliminates the overwriting thing but stops you from directly mounting without going through md.

Version 1.2 is like version 1.1 but stores the superblock 4K from the device start. This is particularly useful if you make a RAID array on a whole device, because this misses partition tables, master boot records, and the like.

In practice they're nearly identical, sharing nearly all of their code:

Do note that the in-kernel autodetect (based on partition types FD) only works for version 0.90 superblocks.

As a workaround, distributions, Ubuntu and Fedora at least, circa early 2009, include init scripts that run any arrays that aren't started by auto-detect, which can include arrays using the newer 1.x superblocks.

Using Fedora 9 as the example, the initscript file that does this is named /etc/rc.d/rc.sysinit. The command used is:

   # Start any MD RAID arrays that haven't been started yet
   [ -f /etc/mdadm.conf -a -x /sbin/mdadm ] && /sbin/mdadm -As --auto=yes --run

The kernel automounter can't mount any of the version 1.x superblocks, and LILO can't boot off them.

Version 1 superblocks allow for an arbitrarily large internal bitmap. It does this by explicitly given the data-start and data-size. So e.g. a 1.1 superblock could make

   data-start==1Gig
   data-size == devicesize minus 1Gig

and put the bitmap after the superblock (which is at the start) and use nearly one Gig for the bitmap.

However mdadm isn't quite so accommodating. It should:

  • when creating an array without a bitmap, leave a reasonable amount of space for one to be added in the future (32-64k).
  • when dynamically adding a bitmap, see how much space is available and use up to that much
  • when creating an array with a bitmap, honour any --bitmap-chunk-size or default and reserve an appropriate amount of space.

I think it might do the first. I think it doesn't do the second two. Maybe in 2.6 .or 2.7

Personal tools