RAID superblock formats

From Linux Raid Wiki
Revision as of 21:00, 31 March 2009 by Grangerx (Talk | contribs)

Jump to: navigation, search

Contents

RAID superblock formats

Currently, the Linux RAID subsystem recognizes two distinct variant superblocks.

They are known as "version-0.90" and "version-1" Superblock formats.

A Note about kernel autodetection of different superblock formats

Current Linux kernels (as of 2.6.28) can only autodetect (based on partition type being set to FD) arrays with superblock version 0.90.

The boot-loader LILO also can only boot from the version 0.90 superblock arrays. Alternative boot loaders, GRUB specifically, probably don't have this particular limitation.

As a workaround for the kernel-autodetection issue, several distributions, including Ubuntu and Fedora, circa early 2009, include init scripts that run any arrays that aren't started by auto-detect, which can include arrays using the newer 1.x superblocks.

Using Fedora 9 as the example, the initscript file that does this is named /etc/rc.d/rc.sysinit. The command used is:

   # Start any MD RAID arrays that haven't been started yet
   [ -f /etc/mdadm.conf -a -x /sbin/mdadm ] && /sbin/mdadm -As --auto=yes --run


The version-0.90 Superblock Format

The version-0.90 superblock format has several limitations. It limits the number of component devices within an array to 28, and limits each component device to a maximum size of 2TB.


The version-1 Superblock Format

The version-1 superblock format represents a more-expandable format, capable of supporting arrays with 384+ devices, with 64-bit sector lengths.

Sub-versions of the version-1 superblock

The "version-1" superblock format is currently used in three different "sub-versions".

The sub-versions differ primarily (solely?) in the location on each component device at which they actually store the superblock.

Sub-VersionSuperblock Position on Device
1.0At the end of the device
1.1At the beginning of the device
1.24K from the beginning of the device

The version-1 superblock format on-disk layout

Total Size of superblock

Total Size of superblock: 256 Bytes, plus 2 bytes per device in the array

Section: Superblock/"Magic-Number" Identification area

16 Bytes, Offset 0-15 (0x00 - 0x0F)

Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
0x00 - 0x03 0 - 3 4 magic "Magic Number"
(Superblock ID)
__u32 0xa92b4efc
(little-endian)
0x04 - 0x07 4 - 7 4 major_version Major Version
of the Superblock
__u32 1
0x08 - 0x0B 8 - 11 4 feature_map Feature Map - which extended features (such as volume bitmaps, recovery, or reshape) are in use on this array __u32 0
Bit-Mapped Field

Bit ValueMeaning
1RAID Bitmap is used
2RAID Recovery is in progress
(See "recovery_offset")
4RAID Reshape is in progress
8undefined/reserved (0)
16undefined/reserved (0)
32undefined/reserved (0)
64undefined/reserved (0)
128undefined/reserved (0)
 
0x0C - 0x0F 12 - 15 4 pad0 Padding Block 0 __u32 0 Always set to
zero when writing


Section: Per-Array Identification & Configuration area

48 Bytes, Offset 16-63 (0x10 - 0x3F)

Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
0x10 - 0x1F 16 - 31 16 set_uuid UUID for the Array(?) __u8[16] Set by user-space
formatting util
0x20 - 0x3F 32 - 63 32 set_name Name for the Array(?) char[32] Set and used by
user-space utils
Nt
0x40 - 0x47 64 - 71 8 ctime Creation Time(?) __u64 low 40-bits are seconds
high 24-bits are uSeconds
0x48 - 0x4B 72 - 75 4 level RAID Level
of the Array
__u32
-4Multi-Path
-1Linear
0RAID-0 (Striped)
1RAID-1 (Mirrored)
4RAID-4 (Striped with Dedicated Block-Level Parity)
5RAID-5 (Striped with Distributed Parity)
6RAID-6 (Striped with Dual Parity)
mdadm versions (as of v2.6.4) limit RAID-6 (creation) to 256 disks or less
0x4C - 0x4F 76 - 79 4 layout layout of array
(RAID5(and 6?) only)
__u32
0left asymmetric
1right asymmetric
2left symmetric (default)
3right symmetric
Controls the relative arrangement of data and parity blocks on the disks.
0x50 - 0x57 80 - 87 8 size used-size of component devices __u64 size of component devices
(in # of 512-byte sectors)
0x58 - 0x5B 88 - 91 4 chunksize chunk-size of the array __u32 chunk-size of the array
(in # of 512-byte sectors)

default is 64K? for raid levels 0, 10, 4, 5, and 6
chunksize not used in raid levels 1, linear, and multi-path

Note: During creation this appears to be created as a multiple of 1024 rather than 512.

0x5C - 0x5F 92 - 95 4 raid_disks (?)number of disks in array(?) __u32 #

raid4 requires a minimum of 2 member devs
raid5 requires a minimum of 2 member devs
raid6 requires a minimum of 4 member devs
raid6 limited to a max of 256 member devs

0x60 - 0x63 96 - 99 4 bitmap_offset # of sectors after superblock
that bitmap starts
(See note about signed value)
__u32 (signed) This is only valid if
feature_map[1] is set

Signed value allows bitmap
to appear before
superblock on the disk


Section: RAID-Reshape In-Process Metadata Storage/Recovery area

64 Bytes, Offset 100-163 (0x64 - 0x7F)
(Note: Only contains valid data if feature_map bit '4' is set)

</tr>

Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
0x64 - 0x67 100 - 103 4 new_level the new RAID level being reshaped-to __u32 see level field (above)  
0x68 - 0x6F 104 - 111 8 reshape_position Next address of the array to reshape __u64 current position of the reshape operation  
0x70 - 0x73 112 - 115 4 delta_disks this holds the change
in # of raid disks
__u32 change in # of raid disks  
0x74 - 0x77 116 - 119 4 new_layout new layout for array __u32 see layout field (above)  
0x78 - 0x7B 120 - 123 4 new_chunk new chunk size __u32 see chunksize field (above)  
0x7C - 0x7F 124 - 127 4 pad1 Padding Block #1 __u8[4] 0 Always set to
zero when writing



Section: This-Component-Device Information area

64 Bytes, Offset 128-191 (0x80 - 0xbf)

Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
0x80 - 0x87 128 - 135 8 data_offset the sector # upon which data starts __u64 sector # where data begins
(Often 0)
0x88 - 0x8F 136 - 143 8 data_size sectors in the device
that are used for data
__u64 # of sectors that can be used for data
0x90 - 0x97 144 - 151 8 super_offset # of the sector upon
which this superblock starts
__u64 # of the sector upon
which this superblock starts
0x98 - 0x9F 152 - 159 8 recovery_offset sectors before this offset
(from data_offset)
have been recovered
__u64 sector #
0xA0 - 0xA3 160 - 163 4 dev_number Fm __u32 Permanent identifier of this device (Not its role in RAID(?))
0xA4 - 0xA7 164 - 167 4 cnt_corrected_read Number of read-errors that were corrected by re-writing __u32 Dv
0xA8 - 0xB7 168 - 183 16 device_uuid UUID of the component device __u8[16] Set by User-Space
Ignored by kernel
0xB8 184 1 devflags Per-Device Flags
(Bit-Mapped Field)
__u8 Bit-Mapped Field

Bit ValueMeaning
1WriteMostly1
2(?)
4(?)
8(?)
16(?)
32(?)
64(?)
128(?)
WriteMostly1 indicates that this device should only be updated on writes, not read from. (Useful with slow devices in RAID1 arrays?)
0xB9 - 0xBF 185 - 191 7 pad2 Padding block 2 __u8[7] 0 Always set to
zero when writing


Section: Array-State Information area

64 Bytes, Offset 192-255 (0xC0 - 0xFF)

Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
0xC0 - 0xC7 192 - 199 8 utime Fm __u64 low 40-bits are seconds
high 24-bits are uSeconds
Nt
0xC8 - 0xCF 200 - 207 8 events Event Count for the Array __u64 # Updated whenever the superblock is updated.
Used by mdadm in re-assembly to detect failed/out-of-sync component devices.
0xD0 - 0xD7 208 - 215 8 resync_offset Offsets before this one (starting from data_offset) are 'known' to be in sync. __u64 offset #
0xD8 - 0xDB 216 - 219 4 sb_csum Checksum of this superblock up to devs[max_dev] __u32 # This value will be different for each component device's superblock.
0xDC - 0xDF 220 - 223 4 max_dev How many devices are part of (or related to) the array __u32 #
0xE0 - 0xFF 224 - 255 32 pad3 Padding Block 3 __u8[32] 0 Always set to
zero when writing


Section: Device-Roles (Positions-in-Array) area

Length: Variable number of bytes (but at least 768 bytes?)
2 Bytes per device in the array, including both spare-devices and faulty-devices

Section: Device-Roles (Positions-in-Array) area
(Variable length - 2 Bytes per Device in Array (including Spares/Faulty-Devs)
 
Offset (Hex) Offset (Dec) Length
(in bytes)
Field Name Usage/Meaning Data Type Data Value Notes
?? Bytes, Offset 256-??? (0x100 - 0x???)
0x100 - 0x??? 256 - ??? ? dev_roles Fm __u16 Role or Position of device in the array.
0xFFFF means "spare".
0xFFFE means "faulty".
Personal tools