Hotplug

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Fully automated hotplug and hot-unplug using UDEV rules)
(Hotplug and hot-unplug from command line)
Line 5: Line 5:
 
=mdadm versions < 3.1.2=
 
=mdadm versions < 3.1.2=
 
==Hotplug and hot-unplug from command line==
 
==Hotplug and hot-unplug from command line==
In mdadm versions < 3.1.2, the possibilites for handling the hotplug are limited:
 
 
*If the physical disk is still alive:  
 
*If the physical disk is still alive:  
 
  mdadm --fail /dev/mdX /dev/sdYZ
 
  mdadm --fail /dev/mdX /dev/sdYZ

Revision as of 13:11, 15 April 2010

HW issues of the disk hotplugging are described in the Hotswap chapter of theHardware issues page.

The Linux RAID supports hotplug operations fully from Hot-unplug branch of the mdadm version 3.1.2.

Contents

mdadm versions < 3.1.2

Hotplug and hot-unplug from command line

  • If the physical disk is still alive:
mdadm --fail /dev/mdX /dev/sdYZ
mdadm --remove /dev/mdX /dev/sdYZ 

Do this for all RAIDs containing partitions of the failed disk. Then the disk can be hot-unplugged without any problems

  • If the physical disk is dead or unplugged, just do
mdadm /dev/mdX --fail detached --remove detached

Fully automated hotplug and hot-unplug using UDEV rules

In case you need fully automatic hot-plug and hot-unplug events handling, the UDEV "add" and "remove" events can be used for this.

Note: the following code had been validated on Linux Debian 5 (Lenny), with kernel 2.6.26 and udevd version 125.

Important notes:

  • the rule for "add" event MUST be placed in a file positioned after the "persistent_storage.rules" file, because it uses the ENV{ID_FS_TYPE} condition, which is produced by the persistent_storage.rules file during the "add" event processing.
  • The rule for "remove" event can reside in any file in the UDEV rules chain, but let's keep it together with the "add" rule :-)

For this reason, in Debian Lenny I placed the mdadm hotplug rules in file /etc/udev/rules.d/66-mdadm-incremental.rules This is the content of the file:

SUBSYSTEM!="block", GOTO="END_66_MDADM"
ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="END_66_MDADM"
ACTION=="add",  RUN+="/usr/local/sbin/handle-add-old $env{DEVNAME}"
ACTION=="remove", RUN+="/usr/local/sbin/handle-remove-old $name"
LABEL="END_66_MDADM"

(these rules are based on the UDEV rules contained in the hot-unplug patches by Doug Ledford)

And here are the scripts which are called from these rules:

#!/bin/bash
#This is the /usr/local/sbin/handle-add-old
MDADM=/sbin/mdadm
LOGGER=/usr/bin/logger
mdline=`mdadm --examine --scan $1` #mdline contains something like "ARRAY /dev/md? level=raid1 num-devices=2 UUID=..."
mddev=${mdline#* }                 #delete "ARRAY " and return the result as mddev
mddev=${mddev%% *}                 #delete everything behind /dev/mdX
$LOGGER $0 $1
if [ -n "$mddev" ]; then
   $LOGGER "Adding $1 into RAID device $mddev"
   log=`$MDADM -a $mddev $1 2>&1`
   $LOGGER "$log"
fi
#!/bin/bash
#This is the /usr/local/sbin/handle-remove-old
MDADM=/sbin/mdadm
LOGGER=/usr/bin/logger
$LOGGER "$0 $1"
mdline=`grep $1 /proc/mdstat`  #mdline contains something like "md0 : active raid1 sda1[0] sdb1[1]"
mddev=${mdline% :*}            #delete everything from " :" till the end of line and return the result as mddev
$LOGGER "$0: Trying to remove $mdpart from $mddev"
log=`$MDADM /dev/$mddev --fail detached --remove detached 2>&1`
$LOGGER $log

Examples of behavior

Let's have the following RAID configuration:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      3903680 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

The md0 contains the system, md1 is for data (but is not used yet).

If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

We see that sda1 has role [2]. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is (F)ailed.

But why the system thinks that /dev/sda2 in /dev/md1 is still OK? Because my system hasn't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:

# dd if=/dev/md1 of=/dev/null bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.0184819 s, 0.1 kB/s
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[2](F) sdb2[1]
      224612672 blocks [2/1] [_U]

At any point after the disk has been unplugged, we can remove its partitions from an array only by this command:

# mdadm /dev/md0 --fail detached --remove detached
mdadm: hot removed 8:1

mdadm versions > 3.1.2

(to be finished)

Personal tools