Revision as of 16:32, 8 December 2007
The /proc/mdstat file shows a snapshot of the kernel's RAID/md state.
querying the status
The kernel md state is easily viewed by running:
It won't hurt. Let's learn how to read the file. Here are some examples:
Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sde1 sdf1 sdb1 sdd1 sdc1 1250241792 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 0/10 pages [0KB], 16384KB chunk unused devices: <none>
Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda1 sdd1 sdb1 1465151808 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: <none>
Personalities : [raid1] [raid6] [raid5] [raid4] md1 : active raid1 sdb2 sda2 136448 blocks [2/2] [UU] md2 : active raid1 sdb3 sda3 129596288 blocks [2/2] [UU] md3 : active raid5 sdl1 sdk1 sdj1 sdi1 sdh1 sdg1 sdf1 sde1 sdd1 sdc1 1318680576 blocks level 5, 1024k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md0 : active raid1 sdb1 sda1 16787776 blocks [2/2] [UU] unused devices: <none>
Personalities : [raid1] [raid6] [raid5] [raid4] md127 : active raid5 sdh1 sdg1 sdf1 sde1 sdd1 sdc1 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_] [==>..................] recovery = 12.6% (37043392/292945152) finish=127.5min speed=33440K/sec unused devices: <none>
Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [raid6] md0 : active raid6 sdf1 sde1 sdd1 sdc1 sdb1 sda1 hdb1 1225557760 blocks level 6, 256k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 0/234 pages [0KB], 512KB chunk unused devices: <none>
Personalities : [raid1] md1 : active raid1 sde1(F) sdg1 sdb1 sdd1 sdc1 488383936 blocks [6/4] [_UUUU_] unused devices: <none>
The Personalities line simply tells you what RAID level the kernel currently supports. This can be change by changing the raid modules or recompiling the kernel. Known personalities: [raid0] [raid1] [raid4] [raid5] [raid6] [linear]
md device line
Each array is then described. from example 1:
md_d0 : active raid5 sde1 sdf1 sdb1 sdd1 sdc1
This means we're looking at the device /dev/md_d0.
It is active or 'started'. An inactive array is usually faulty. Stopped arrays aren't visible here.
It is a raid5 array and the component devices are:
/dev/sde1 is device 0 /dev/sdf1 is device 4 /dev/sdb1 is device 5 /dev/sdd1 is device 2 /dev/sdc1 is device 1
The order in which the devices appear in this line means nothing.
The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with "n" or higher are spare disks. 0,1,..,n-1 are for the working array. Notice that there is no device 3. At some point that device will have failed and been replaced by device 5. (check if this is true!)
To identify a spare devices, first look for the [#/#] value on a line. The first number is the number of a complete raid device as defined. Lets say it is "n".
Also, if you have a failure, the failed device will be marked with (F) after the [#] (see example 6 where sde1 has failed. The spare that replaces this device will be the device with the lowest role number n or higher that is not marked (F). Once the resync operation is complete, the device's role numbers are swapped.
To identify the spare devices, first determine the number of device the array needs to be fully operational (see below for the [#/#] value). Lets say it is "m". The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with "m" or higher are spare disks. 0,1,..,m-1 are for the working array.
md config/status line
The next line continues the decription of the array; in example 1 it is:
1250241792 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
This line provides some basic data about the fixed size and layout: it indicates the useable size of the array in blocks is 1250241792; the array uses a 1.2 [SuperBlock] and confirms a level 5 (this is redundant!) array with a chunk size of 64k using algorithm 2. (See RAID_Creation for more details).
The final 2 entries on this line
are more dynamic.
[n/m] means that ideally the array would have m devices however, currently, n devices are in use. Obviously when n == m then things are good.
represents the status of each device, either U for up or _ for down. So examples 2 and 6 show 'degraded' arrays with some devices 'down'.
If an array has a [Bitmap] then this line describes the state. Example 1 shows
bitmap: 0/10 pages [0KB], 16384KB chunk
What would it mean when it's, eg: 23/234
This refers to the in-memory bitmap (basically a cache of what's in the on-disk bitmap -- it allows bitmap operations to be more efficient).
If it's 23/234 that means there are 23 of 234 pages allocated in the in-memory bitmap. The pages are allocated on demand, and get freed when they're empty (all zeroes). The in-memory bitmap uses 16 bits for each bitmap chunk to count all ongoing writes to the chunk, so it's actually up to 16 times larger than the on-disk bitmap.
example 4 is clearly showing some recovery activity:
[==>..................] recovery = 12.6% (37043392/292945152) finish=127.5min speed=33440K/sec
The first part is simply a graphical representation of the progress. The rest of the line is fairly self explanatory. The finish time is only an approximation since the resync speed will vary according to other I/O demands. See the resync page for more details.