Write-intent bitmap

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
 
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
random comments for now
+
When an array has a write-intent bitmap, a spindle (a device, often a hard drive) can be removed and re-added, then only blocks changes since the removal (as recorded in the bitmap) will be resynced.
  
Bitmaps optimise rebuild time after a crash, or after removing and
+
Therefore a [[Write-intent bitmap|write-intent bitmap]] reduces rebuild/recovery (md ''sync'') time if:
re-adding a device. They do not improve normal read/write
+
* the machine crashes (unclean shutdown)
performance, and may well cause a small degradation in performance.
+
* one spindle is disconnected, then reconnected
 +
   
 +
If one spindle fails and has to be replaced, a bitmap makes no difference.
  
But bitmap support isn't available for raid0, because it is
+
A write-intent bitmap:
meaningless.
+
* does not improve performance
The 'bitmap' records sections of the array which might be
+
* may cause a degradation in write performance, it varies upon:
inconsistent.  As raid0 have no redundancy, it cannot be
+
** the size of the chunk data (on the RAID device) mapped to each bit in the bitmap, as expressed by ''cat /proc/mdstat''. The ratio (bitmap size / RAID device size )
inconsistent.  So there is nothing to record.
+
** workload profile (long sequences of writes are more impacted, as spindle heads go back and forth between the data zone and the bitmap zone)
 +
* can be removed/added at any time
  
When an array has a bitmap, a device can be removed and re-added
+
Write-intent bitmap support is only available for RAID geometries causing data redundancy. For example: as RAID0 has no redundancy it cannot be inconsistent, so there is nothing to record in such a bitmap.
and only blocks changes since the removal (as recorded in the bitmap)
+
will be resynced.
+
  
It should be possible to do a similar thing to arrays without bitmaps.
+
== Creating Bitmaps ==
i.e. if a device is removed and re-added and *no* changes have been
+
made in the interim, then the add should not require a resync.
+
  
This patch allows that option.
+
Use:
This means that when assembling an array one device at a time (e.g.
+
  mdadm --grow --bitmap=internal /dev/mdX
during device discovery) the array can be enabled read-only as soon
+
as enough devices are available, but extra devices can still be added
+
without causing a resync.
+
  
 +
This operation is reversible:
 +
  mdadm --grow --bitmap=none /dev/mdX
  
> How do I interpret:
+
Bitmaps can also be created externally on an ext3 filesystem (which '''must not''' be on the RAID device).
>      bitmap: 0/117 pages [0KB], 1024KB chunk
+
> in the mdstat output
+
>
+
> what does it mean when it's, eg: 23/117
+
  
This refers to the in-memory bitmap (basically a cache of what's in the on-disk bitmap -- it allows bitmap operations to be more efficient).
+
== How Bitmaps Work by Q&A ==
  
If it's 23/117 that means there are 23 of 117 pages allocated in the in-memory bitmap. The pages are allocated on demand, and get freed when they're empty (all zeroes). The in-memory bitmap uses 16 bits for each bitmap chunk to count all ongoing writes to the chunk, so it's actually up to 16 times larger than the on-disk bitmap.
+
;Why, when you first create a raid1 (mirrored) array from two drives, does ''mdadm'' insist on mirroring the contents of the first drive to the second even though the drives are entirely blank (e.g. new drives don't have anything on them)? 
 +
:Well... they do have something one them - lots of zeros and ones, or maybe just zeros, or maybe just ones.  Sure, you may not be interested in that data, but it is there.
 +
 
 +
;In one configuration I have, this takes about 16 hours on a 400Gb drive.  When I do 5 of them simultaneously this takes 2+ days to complete.  Is there some way to tell mdadm that you want to create a mirrored set but skip this rather long initial mirroring process?  I don't really see that it actually accomplishes anything.
 +
:No, there is no way to tell mdadm to skip the initial copying process.  It is not clear to me that you really want to do this(*) (though on the "enough rope" principle the "--assume-clean" works in --create mode).
 +
:I suggest you simply ignore the fact that it is doing the copy.  Just keep using the array as though it wasn't.  If this seems to be impacting over-all system performance, tune ''/proc/sys/dev/raid/speed_*'' (those are global parameters, check ''/sys/devices/virtual/block/md*/md/'' for parameters established per ''md'' device) the to slow it down even more. If you reboot, it should remember where it was up to and restart from the same place (providing you are using a 2.6 kernel).
 +
:If you have 5 of these 400Gb raid1's, then I suspect you really want to avoid the slow resync that happens after a crash.  You should look into adding a bitmap write-intent log.  This requires 2.6.14, and mdadm 2.1, and is as easy as:
 +
  mdadm --grow --bitmap=internal /dev/md3
 +
:while the array is running.
 +
:This should dramatically reduce resync time, at a possible small cost in write throughput.  Some limited measurements I have done suggest up to 10% slowdown, though usually less.  Possibly some tuning can make it much better.
 +
 
 +
== Bitmaps and /proc/mdstat ==
 +
 
 +
The [[mdstat|/proc/mdstat]] page describes how to interpret the bitmap line.
 +
 
 +
== Non bitmap-optimised resyncs ==
 +
 
 +
It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and *no* changes have been made in the interim, then the add should not require a resync.
 +
 
 +
A patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync.
 +
 
 +
== Used disk space for bitmaps ==
 +
 
 +
It uses space that the alignment requirements of the metadata assure us is otherwise unused. For v0.90, that is limited to 60K.  For 1.x it is 3K. As this is unused disk space, bitmaps can be added to an existing md device without the risk to take away space from an existing filesystem on that device.
 +
 
 +
== See also ==
 +
 
 +
* [[Growing#Adding_partitions|Growing - Adding partitions]]
 +
* ''md'' manpage (invoke "man md"), section titled "BITMAP WRITE-INTENT LOGGING"

Latest revision as of 17:53, 21 March 2011

When an array has a write-intent bitmap, a spindle (a device, often a hard drive) can be removed and re-added, then only blocks changes since the removal (as recorded in the bitmap) will be resynced.

Therefore a write-intent bitmap reduces rebuild/recovery (md sync) time if:

  • the machine crashes (unclean shutdown)
  • one spindle is disconnected, then reconnected

If one spindle fails and has to be replaced, a bitmap makes no difference.

A write-intent bitmap:

  • does not improve performance
  • may cause a degradation in write performance, it varies upon:
    • the size of the chunk data (on the RAID device) mapped to each bit in the bitmap, as expressed by cat /proc/mdstat. The ratio (bitmap size / RAID device size )
    • workload profile (long sequences of writes are more impacted, as spindle heads go back and forth between the data zone and the bitmap zone)
  • can be removed/added at any time

Write-intent bitmap support is only available for RAID geometries causing data redundancy. For example: as RAID0 has no redundancy it cannot be inconsistent, so there is nothing to record in such a bitmap.

Contents

[edit] Creating Bitmaps

Use:

  mdadm --grow --bitmap=internal /dev/mdX

This operation is reversible:

  mdadm --grow --bitmap=none /dev/mdX

Bitmaps can also be created externally on an ext3 filesystem (which must not be on the RAID device).

[edit] How Bitmaps Work by Q&A

Why, when you first create a raid1 (mirrored) array from two drives, does mdadm insist on mirroring the contents of the first drive to the second even though the drives are entirely blank (e.g. new drives don't have anything on them)?
Well... they do have something one them - lots of zeros and ones, or maybe just zeros, or maybe just ones. Sure, you may not be interested in that data, but it is there.
In one configuration I have, this takes about 16 hours on a 400Gb drive. When I do 5 of them simultaneously this takes 2+ days to complete. Is there some way to tell mdadm that you want to create a mirrored set but skip this rather long initial mirroring process? I don't really see that it actually accomplishes anything.
No, there is no way to tell mdadm to skip the initial copying process. It is not clear to me that you really want to do this(*) (though on the "enough rope" principle the "--assume-clean" works in --create mode).
I suggest you simply ignore the fact that it is doing the copy. Just keep using the array as though it wasn't. If this seems to be impacting over-all system performance, tune /proc/sys/dev/raid/speed_* (those are global parameters, check /sys/devices/virtual/block/md*/md/ for parameters established per md device) the to slow it down even more. If you reboot, it should remember where it was up to and restart from the same place (providing you are using a 2.6 kernel).
If you have 5 of these 400Gb raid1's, then I suspect you really want to avoid the slow resync that happens after a crash. You should look into adding a bitmap write-intent log. This requires 2.6.14, and mdadm 2.1, and is as easy as:
  mdadm --grow --bitmap=internal /dev/md3
while the array is running.
This should dramatically reduce resync time, at a possible small cost in write throughput. Some limited measurements I have done suggest up to 10% slowdown, though usually less. Possibly some tuning can make it much better.

[edit] Bitmaps and /proc/mdstat

The /proc/mdstat page describes how to interpret the bitmap line.

[edit] Non bitmap-optimised resyncs

It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and *no* changes have been made in the interim, then the add should not require a resync.

A patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync.

[edit] Used disk space for bitmaps

It uses space that the alignment requirements of the metadata assure us is otherwise unused. For v0.90, that is limited to 60K. For 1.x it is 3K. As this is unused disk space, bitmaps can be added to an existing md device without the risk to take away space from an existing filesystem on that device.

[edit] See also

Personal tools