Hotplug
HW issues of the disk hotplugging are described in the Hotswap chapter of theHardware issues page.
The Linux RAID supports hotplug operations fully from Hot-unplug branch of the mdadm version 3.1.2.
Contents |
mdadm versions < 3.1.2
Hotplug and hot-unplug from command line
In mdadm versions < 3.1.2, the possibilites for handling the hotplug are limited:
- If the physical disk is still alive:
mdadm --fail /dev/mdX /dev/sdYZ mdadm --remove /dev/mdX /dev/sdYZ
Do this for all RAIDs containing partitions of the failed disk. Then the disk can be hot-unplugged without any problems
- If the physical disk is dead or unplugged, just do
mdadm /dev/mdX --fail detached --remove detached
Fully automated hotplug and hot-unplug using UDEV rules
In case you need fully automatic hot-plug and hot-unplug events handling, the UDEV "add" and "remove" events can be used for this.
Note: the following code had been validated on Linux Debian 5 (Lenny), with kernel 2.6.26 and udevd version 125.
Important notes:
- the rule for "add" event MUST be placed in a file positioned after the "persistent_storage.rules" file, because it uses the ENV{ID_FS_TYPE} condition, which is produced by the persistent_storage.rules file during the "add" event processing.
- The rule for "remove" event can reside in any file in the UDEV rules chain, but let's keep it together with the "add" rule :-)
For this reason, in Debian Lenny I placed the mdadm hotplug rules in file /etc/udev/rules.d/66-mdadm-incremental.rules This is the content of the file:
SUBSYSTEM!="block", GOTO="END_66_MDADM" ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="END_66_MDADM" ACTION=="add", RUN+="/usr/local/sbin/handle-add-old $env{DEVNAME}" ACTION=="remove", RUN+="/usr/local/sbin/handle-remove-old $name" LABEL="END_66_MDADM"
(these rules are based on the UDEV rules contained in the hot-unplug patches by Doug Ledford)
And here are the scripts which are called from these rules:
#!/bin/bash #This is the /usr/local/sbin/handle-add-old MDADM=/sbin/mdadm LOGGER=/usr/bin/logger mdline=`mdadm --examine --scan $1` #mdline contains something like "ARRAY /dev/md? level=raid1 num-devices=2 UUID=..." mddev=${mdline#* } #delete "ARRAY " and return the result as mddev mddev=${mddev%% *} #delete everything behind /dev/mdX $LOGGER $0 $1 if [ -n "$mddev" ]; then $LOGGER "Adding $1 into RAID device $mddev" log=`$MDADM -a $mddev $1 2>&1` $LOGGER "$log" fi
#!/bin/bash #This is the /usr/local/sbin/handle-remove-old MDADM=/sbin/mdadm LOGGER=/usr/bin/logger $LOGGER "$0 $1" mdline=`grep $1 /proc/mdstat` #mdline contains something like "md0 : active raid1 sda1[0] sdb1[1]" mddev=${mdline% :*} #delete everything from " :" till the end of line and return the result as mddev $LOGGER "$0: Trying to remove $mdpart from $mddev" log=`$MDADM /dev/$mddev --fail detached --remove detached 2>&1` $LOGGER $log
Examples of behavior
Let's have the following RAID configuration:
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[0] sdb1[1] 3903680 blocks [2/2] [UU] md1 : active raid1 sda2[0] sdb2[1] 224612672 blocks [2/2] [UU]
The md0 contains the system, md1 is for data (but is not used yet).
If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[2](F) sdb1[1] 3903680 blocks [2/1] [_U] md1 : active raid1 sda2[0] sdb2[1] 224612672 blocks [2/2] [UU]
We see that sda1 has role [2]. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is (F)ailed.
But why the system thinks that /dev/sda2 in /dev/md1 is still OK? Because my system hasn't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:
# dd if=/dev/md1 of=/dev/null bs=1 count=1 1+0 records in 1+0 records out 1 byte (1 B) copied, 0.0184819 s, 0.1 kB/s
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[2](F) sdb1[1] 3903680 blocks [2/1] [_U] md1 : active raid1 sda2[2](F) sdb2[1] 224612672 blocks [2/1] [_U]
At any point after the disk has been unplugged, we can remove its partitions from an array only by this command:
# mdadm /dev/md0 --fail detached --remove detached mdadm: hot removed 8:1
mdadm versions > 3.1.2
(to be finished)