HW issues of the disk hotplugging are described in the Hotswap chapter of the Hardware issues page.

The Linux RAID supports hotplug operations fully from Hot-unplug branch of the mdadm version 3.1.2.


mdadm versions < 3.1.2

In older version of mdadm, the hotplug & hot-unplug support is present, but for full automatic functionality, we need to employ some bits of scripting. First of all, look what madm provides by manually trying its features from command line:

Hot-unplug from command line

  • If the physical disk is still alive:
mdadm --fail /dev/mdX /dev/sdYZ
mdadm --remove /dev/mdX /dev/sdYZ 

Do this for all RAIDs containing partitions of the failed disk. Then the disk can be hot-unplugged without any problems

  • If the physical disk is dead or unplugged, just do
mdadm /dev/mdX --fail detached --remove detached

Fully automated hotplug and hot-unplug using UDEV rules

In case you need fully automatic hot-plug and hot-unplug events handling, the UDEV "add" and "remove" events can be used for this.

Note: the following code had been validated on Linux Debian 5 (Lenny), with kernel 2.6.26 and udevd version 125.

Important notes:

  • the rule for "add" event MUST be placed in a file positioned after the "persistent_storage.rules" file, because it uses the ENV{ID_FS_TYPE} condition, which is produced by the persistent_storage.rules file during the "add" event processing.
  • The rule for "remove" event can reside in any file in the UDEV rules chain, but let's keep it together with the "add" rule :-)

For this reason, in Debian Lenny I placed the mdadm hotplug rules in file /etc/udev/rules.d/66-mdadm-hotplug.rules This is the content of the file:

ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="END_66_MDADM"
ACTION=="add",  RUN+="/usr/local/sbin/handle-add-old $env{DEVNAME}"
ACTION=="remove", RUN+="/usr/local/sbin/handle-remove-old $name"

(these rules are based on the UDEV rules contained in the hot-unplug patches by Doug Ledford)

And here are the scripts which are called from these rules:

#This is the /usr/local/sbin/handle-add-old
mdline=`mdadm --examine --scan $1` #mdline contains something like "ARRAY /dev/mdX level=raid1 num-devices=2 UUID=..."
mddev=${mdline#* }                 #delete "ARRAY " and return the result as mddev
mddev=${mddev%% *}                 #delete everything behind /dev/mdX
$LOGGER $0 $1
if [ -n "$mddev" ]; then
   $LOGGER "Adding $1 into RAID device $mddev"
   log=`$MDADM -a $mddev $1 2>&1`
   $LOGGER "$log"
#This is the /usr/local/sbin/handle-remove-old
$LOGGER "$0 $1"
mdline=`grep $1 /proc/mdstat`  #mdline contains something like "md0 : active raid1 sda1[0] sdb1[1]"
mddev=${mdline% :*}            #delete everything from " :" till the end of line and return the result as mddev
$LOGGER "$0: Trying to remove $mdpart from $mddev"
log=`$MDADM /dev/$mddev --fail detached --remove detached 2>&1`
$LOGGER $log

mdadm versions > 3.1.2

The hot-unplug support introduced in mdadm version 3.1.2 removed the necessity of scripting you see above. If your Linux distribution contains this or later version of mdadm, you hopefully have fully automatic hotplug and hot-unplug without any hassles.

Examples of behavior WITHOUT the automatic hotplug/hot-unplug

Let's have the following RAID configuration:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      3903680 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

The md0 contains the system, md1 is for data (but is not used yet).


If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

We see that sda1 has role [2]. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is (F)ailed.

But why the system thinks that /dev/sda2 in /dev/md1 is still OK? Because my system hasn't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:

# dd if=/dev/md1 of=/dev/null bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.0184819 s, 0.1 kB/s
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[2](F) sdb2[1]
      224612672 blocks [2/1] [_U]

At any point after the disk has been unplugged, we can remove its partitions from an array only by this command:

# mdadm /dev/md0 --fail detached --remove detached
mdadm: hot removed 8:1


(to be finished: example of how the kernel assigns new drive letter to the same old disk we have just unplugged, because it considers the /dev/sda as seized ...)

