Hotplug

From Linux Raid Wiki
Revision as of 13:02, 15 April 2010 by Dulik (Talk | contribs)

Jump to: navigation, search

HW issues of the disk hotplugging are described in the Hotswap chapter of theHardware issues page.

The Linux RAID supports hotplug operations fully from Hot-unplug branch of the mdadm version 3.1.2.

Contents

mdadm versions < 3.1.2

Hotplug and hot-unplug from command line

In mdadm versions < 3.1.2, the possibilites for handling the hotplug are limited:

  • If the physical disk is still alive:
mdadm --fail /dev/mdX /dev/sdYZ
mdadm --remove /dev/mdX /dev/sdYZ 

Do this for all RAIDs containing partitions of the failed disk. Then the disk can be hot-unplugged without any problems

  • If the physical disk is dead or unplugged, just do
mdadm /dev/mdX --fail detached --remove detached

Fully automated hotplug and hot-unplug using UDEV rules

In case you need fully automatic hot-plug and hot-unplug events handling, the UDEV "add" and "remove" events can be used for this.

Note: the following code had been validated on Linux Debian 5 (Lenny), with kernel 2.6.26 and udevd version 125.

UDEV rules

Although the rule for "remove" event can reside in any file in the UDEV rules chain. However, the rule for "add" MUST be placed in a file positioned after the "persistent_storage.rules" file, because it uses the ENV{ID_FS_TYPE} condition, which is produced by the persistent_storage.rules file during the "add" event processing.

For this reason, in Debian Lenny I placed the mdadm hotplug rules in file /etc/udev/rules.d/66-mdadm-incremental.rules This is the content of the file:

SUBSYSTEM!="block", GOTO="END_66_MDADM"
ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="END_66_MDADM"
ACTION=="add",  RUN+="/usr/local/sbin/handle-add-old $env{DEVNAME}"
ACTION=="remove", RUN+="/usr/local/sbin/handle-remove-old $name"
LABEL="END_66_MDADM"

(these rules are based on the UDEV rules contained in the hot-unplug patches by Doug Ledford)

And here are the scripts which are called from these rules:

#!/bin/bash
#This is the /usr/local/sbin/handle-add-old
MDADM=/sbin/mdadm
LOGGER=/usr/bin/logger
mdline=`mdadm --examine --scan $1` #mdline contains something like "ARRAY /dev/md? level=raid1 num-devices=2 UUID=..."
mddev=${mdline#* }                 #delete "ARRAY " and return the result as mddev
mddev=${mddev%% *}                 #delete everything behind /dev/mdX
$LOGGER $0 $1
if [ -n "$mddev" ]; then
   $LOGGER "Adding $1 into RAID device $mddev"
   log=`$MDADM -a $mddev $1 2>&1`
   $LOGGER "$log"
fi
#This is the /usr/local/sbin/handle-remove-old
#!/bin/bash
MDADM=/sbin/mdadm
LOGGER=/usr/bin/logger
$LOGGER "$0 $1"
mdline=`grep $1 /proc/mdstat`  #mdline contains something like "md0 : active raid1 sda1[0] sdb1[1]"
mddev=${mdline% :*}            #delete everything from " :" till the end of line and return the result as mddev
$LOGGER "$0: Trying to remove $mdpart from $mddev"
log=`$MDADM /dev/$mddev --fail detached --remove detached 2>&1`
$LOGGER $log

Examples of behavior

Let's have the following RAID configuration:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      3903680 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

The md0 contains the system, md1 is for data (but is not used yet).

If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

We see that sda1 has role [2]. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is (F)ailed.

But why the system thinks that /dev/sda2 in /dev/md1 is still OK? Because my system hasn't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:

# dd if=/dev/md1 of=/dev/null bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.0184819 s, 0.1 kB/s
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[2](F) sdb2[1]
      224612672 blocks [2/1] [_U]

At any point after the disk has been unplugged, we can remove its partitions from an array only by this command:

# mdadm /dev/md0 --fail detached --remove detached
mdadm: hot removed 8:1

mdadm versions > 3.1.2

(to be finished)

Personal tools