Write-intent bitmap

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Used disk space for bitmaps)
m (janitor :-))
Line 1: Line 1:
 
'''random comments for now'''
 
'''random comments for now'''
  
Bitmaps optimise rebuild time after a crash, or after removing and
+
Bitmaps optimise rebuild time after a crash, or after removing and re-adding a device.  They do not improve normal read/write performance, and cause a small degradation in performance.
re-adding a device.  They do not improve normal read/write
+
performance, and may well cause a small degradation in performance.
+
  
When an array has a bitmap, a device can be removed and re-added
+
When an array has a bitmap, a device can be removed and re-added and only blocks changes since the removal (as recorded in the bitmap) will be resynced.
and only blocks changes since the removal (as recorded in the bitmap)
+
will be resynced.
+
 
+
Note that bitmap support isn't available for raid0, because it is
+
meaningless.
+
The 'bitmap' records sections of the array which might be
+
inconsistent.  As raid0 have no redundancy, it cannot be
+
inconsistent.  So there is nothing to record.
+
  
 +
Note that bitmap support is only available for RAID geometries raid0, because it is meaningless:  As raid0 have no redundancy, it cannot be inconsistent. So there is nothing to record.
  
 
== Creating Bitmaps ==
 
== Creating Bitmaps ==
 +
 
Use:
 
Use:
 
   mdadm --grow --bitmap=internal /dev/mdX
 
   mdadm --grow --bitmap=internal /dev/mdX
Line 23: Line 15:
 
   mdadm --grow --bitmap=none /dev/mdX
 
   mdadm --grow --bitmap=none /dev/mdX
  
Bitmaps can also be created externally on an ext3 filesystem (which '''must not''' be on the raid device)
+
Bitmaps can also be created externally on an ext3 filesystem (which '''must not''' be on the raid device).
  
 
== How Bitmaps Work by Q&A ==
 
== How Bitmaps Work by Q&A ==
Line 29: Line 21:
 
;Why, when you first create a raid1 (mirrored) array from two drives, does mdadm insist on mirroring the contents of the first drive to the second even though the drives are entirely blank (e.g. new drives don't have anything on them).   
 
;Why, when you first create a raid1 (mirrored) array from two drives, does mdadm insist on mirroring the contents of the first drive to the second even though the drives are entirely blank (e.g. new drives don't have anything on them).   
 
:Well... they do have something one them - lots of zeros and ones, or maybe just zeros, or maybe just ones.  Sure, you may not be interested in that data, but it is there.
 
:Well... they do have something one them - lots of zeros and ones, or maybe just zeros, or maybe just ones.  Sure, you may not be interested in that data, but it is there.
 
  
 
;In one configuration I have, this takes about 16 hours on a 400Gb drive.  When I do 5 of them simultaneously this takes 2+ days to complete.  Is there some way to tell mdadm that you want to create a mirrored set but skip this rather long initial mirroring process?  I don't really see that it actually accomplishes anything.
 
;In one configuration I have, this takes about 16 hours on a 400Gb drive.  When I do 5 of them simultaneously this takes 2+ days to complete.  Is there some way to tell mdadm that you want to create a mirrored set but skip this rather long initial mirroring process?  I don't really see that it actually accomplishes anything.
:No, there is no way to tell mdadm to skip the initial copying process.  It is not clear to me that you really want to do this(*) (though on the "enough rope" principle I'm probably going to extend the "--assume-clean" option to work in --create mode).
+
:No, there is no way to tell mdadm to skip the initial copying process.  It is not clear to me that you really want to do this(*) (though on the "enough rope" principle the "--assume-clean" works in --create mode).
 
+
:I suggest you simply ignore the fact that it is doing the copy.  Just keep using the array as though it wasn't.  If this seems to be impacting over-all system performance, tune ''/proc/sys/dev/raid/speed_*'' (those are global parameters, check ''/sys/devices/virtual/block/md*/md/'' for parameters established per ''md'' device) the to slow it down even more. If you reboot, it should remember where it was up to and restart from the same place (providing you are using a 2.6 kernel).
:I suggest you simply ignore the fact that it is doing the copy.  Just keep using the array as though it wasn't.  If this seems to be impacting over-all system performance, tune proc/sys/dev/raid/speed_* to slow it down even more. If you reboot, it should remember where it was up to and restart from the same place (providing you are using a 2.6 kernel).
+
:If you have 5 of these 400Gb raid1's, then I suspect you really want to avoid the slow resync that happens after a crash.  You should look into adding a bitmap write-intent log.  This requires 2.6.14, and mdadm 2.1, and is as easy as:
 
+
:If you have 5 of these 400Gb raid1's, then I suspect you really want to avoid the slow resync that happens after a crash.  You should look into adding a bitmap write-intent log.  This requires 2.6.14, and mdadm 2.1, and is as easy as
+
 
   mdadm --grow --bitmap=internal /dev/md3
 
   mdadm --grow --bitmap=internal /dev/md3
 
:while the array is running.
 
:while the array is running.
 
+
:This should dramatically reduce resync time, at a possible small cost in write throughput.  Some limited measurements I have done suggest up to 10% slowdown, though usually less.  Possibly some tuning can make it much better.
:This should dramatically reduce resync time, at a possible small cost in write throughput.  Some limited measurements I have done suggest up-to 10% slowdown, though usually less.  Possibly some tuning can make it much better.
+
  
 
(*)
 
(*)
A raid array can suffer from sleeping bad blocks.  i.e. blocks that
+
A raid array can suffer from sleeping bad blocks.  i.e. blocks that you cannot read, but normally you never do (because they haven't been
you cannot read, but normally you never do (because they haven't been
+
allocated to a file yet).  When a drive fails, and you are recovering the data onto a spare, hitting that sleeper can kill your
allocated to a file yet).  When a drive fails, and you are
+
array. For this reason it is good to regularly (daily, or weekly, maybe monthly) read through the entire array making sure everything is OK. In 2.6.16 (with complete functionality in 2.6.17) you will be able to trigger a background read-test of the whole array:
recovering the data onto a spare, hitting that sleeper can kill your
+
echo check > /sys/block/mdX/md/sync_action
array.  
+
If you were to create an array with --assume-clean, then whenever you run this it will report lots of errors, though you can fix them with
For this reason it is good to regularly (daily, or weekly, maybe
+
monthly) read through the entire array making sure everything is OK.
+
In 2.6.16 (with complete functionality in 2.6.17) you will be able to
+
trigger a background read-test of the whole array:
+
  echo check > /sys/block/mdX/md/sync_action
+
If you were to create an array with --assume-clean, then whenever you
+
run this it will report lots of errors, though you can fix them with
+
 
   echo repair > /sys/block/mdX/md/sync_action
 
   echo repair > /sys/block/mdX/md/sync_action
If you are going to be doing that (and I would recommend it) then you
+
If you are going to be doing that (and I would recommend it) then you may as well allow the initial sync, especially as you can quite happily ignore the fact that it is happening.
may as well allow the initial sync, especially as you can quite
+
happily ignore the fact that it is happening.
+
  
 
== Bitmaps and /proc/mdstat ==
 
== Bitmaps and /proc/mdstat ==
Line 64: Line 43:
 
The [[mdstat|/proc/mdstat]] page describes how to interpret the bitmap line.
 
The [[mdstat|/proc/mdstat]] page describes how to interpret the bitmap line.
  
==Non Bitmap-Optimised Resyncs==
+
== Non bitmap-optimised resyncs ==
  
It should be possible to do a similar thing to arrays without bitmaps.
+
It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and *no* changes have been made in the interim, then the add should not require a resync.
i.e. if a device is removed and re-added and *no* changes have been
+
made in the interim, then the add should not require a resync.
+
  
There is a patch allows that option.
+
A patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync.
This means that when assembling an array one device at a time (e.g.
+
during device discovery) the array can be enabled read-only as soon
+
as enough devices are available, but extra devices can still be added
+
without causing a resync.
+
  
==Used disk space for bitmaps==
+
== Used disk space for bitmaps ==
  
It uses space that the alignment requirements of the metadata assure us is
+
It uses space that the alignment requirements of the metadata assure us is otherwise unused. For v0.90, that is limited to 60K.  For 1.x it is 3K. As this is unused disk space, bitmaps can be added to an existing md device without the risk to take away space from an existing filesystem on that device.
otherwise unused. For v0.90, that is limited to 60K.  For 1.x it is 3K.
+
As this is unsed disk space, bitmaps can be added to an existing md device  
+
without the risk to take away space from an existing filesystem on that  
+
device.
+

Revision as of 14:23, 21 March 2011

random comments for now

Bitmaps optimise rebuild time after a crash, or after removing and re-adding a device. They do not improve normal read/write performance, and cause a small degradation in performance.

When an array has a bitmap, a device can be removed and re-added and only blocks changes since the removal (as recorded in the bitmap) will be resynced.

Note that bitmap support is only available for RAID geometries raid0, because it is meaningless: As raid0 have no redundancy, it cannot be inconsistent. So there is nothing to record.

Contents

Creating Bitmaps

Use:

  mdadm --grow --bitmap=internal /dev/mdX

This operation is reversible:

  mdadm --grow --bitmap=none /dev/mdX

Bitmaps can also be created externally on an ext3 filesystem (which must not be on the raid device).

How Bitmaps Work by Q&A

Why, when you first create a raid1 (mirrored) array from two drives, does mdadm insist on mirroring the contents of the first drive to the second even though the drives are entirely blank (e.g. new drives don't have anything on them).
Well... they do have something one them - lots of zeros and ones, or maybe just zeros, or maybe just ones. Sure, you may not be interested in that data, but it is there.
In one configuration I have, this takes about 16 hours on a 400Gb drive. When I do 5 of them simultaneously this takes 2+ days to complete. Is there some way to tell mdadm that you want to create a mirrored set but skip this rather long initial mirroring process? I don't really see that it actually accomplishes anything.
No, there is no way to tell mdadm to skip the initial copying process. It is not clear to me that you really want to do this(*) (though on the "enough rope" principle the "--assume-clean" works in --create mode).
I suggest you simply ignore the fact that it is doing the copy. Just keep using the array as though it wasn't. If this seems to be impacting over-all system performance, tune /proc/sys/dev/raid/speed_* (those are global parameters, check /sys/devices/virtual/block/md*/md/ for parameters established per md device) the to slow it down even more. If you reboot, it should remember where it was up to and restart from the same place (providing you are using a 2.6 kernel).
If you have 5 of these 400Gb raid1's, then I suspect you really want to avoid the slow resync that happens after a crash. You should look into adding a bitmap write-intent log. This requires 2.6.14, and mdadm 2.1, and is as easy as:
  mdadm --grow --bitmap=internal /dev/md3
while the array is running.
This should dramatically reduce resync time, at a possible small cost in write throughput. Some limited measurements I have done suggest up to 10% slowdown, though usually less. Possibly some tuning can make it much better.

(*) A raid array can suffer from sleeping bad blocks. i.e. blocks that you cannot read, but normally you never do (because they haven't been allocated to a file yet). When a drive fails, and you are recovering the data onto a spare, hitting that sleeper can kill your array. For this reason it is good to regularly (daily, or weekly, maybe monthly) read through the entire array making sure everything is OK. In 2.6.16 (with complete functionality in 2.6.17) you will be able to trigger a background read-test of the whole array:

echo check > /sys/block/mdX/md/sync_action

If you were to create an array with --assume-clean, then whenever you run this it will report lots of errors, though you can fix them with

  echo repair > /sys/block/mdX/md/sync_action

If you are going to be doing that (and I would recommend it) then you may as well allow the initial sync, especially as you can quite happily ignore the fact that it is happening.

Bitmaps and /proc/mdstat

The /proc/mdstat page describes how to interpret the bitmap line.

Non bitmap-optimised resyncs

It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and *no* changes have been made in the interim, then the add should not require a resync.

A patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync.

Used disk space for bitmaps

It uses space that the alignment requirements of the metadata assure us is otherwise unused. For v0.90, that is limited to 60K. For 1.x it is 3K. As this is unused disk space, bitmaps can be added to an existing md device without the risk to take away space from an existing filesystem on that device.

Personal tools