Revision as of 07:32, 15 April 2010

HW issues of the disk hotplugging are described in the Hotswap chapter of theHardware issues page.

The Linux RAID supports hotplug operations fully from Hot-unplug branch of the mdadm version 3.1.2.

mdadm versions < 3.1.2

Let's have the following RAID configuration:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      3903680 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

The md0 contains the system, md1 is for data (but is not used yet).

If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[0] sdb2[1]
      224612672 blocks [2/2] [UU]

We see that sda1 has role [2]. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is (F)ailed.

But why the system thinks that sda2 is still OK? Because my system haven't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:

# dd if=/dev/md1 of=/dev/null bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.0184819 s, 0.1 kB/s

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2](F) sdb1[1]
      3903680 blocks [2/1] [_U]

md1 : active raid1 sda2[2](F) sdb2[1]
      224612672 blocks [2/1] [_U]

What we can do is remove the failed disk:

# mdadm /dev/md0 --fail detached --remove detached
mdadm: hot removed 8:1

mdadm versions > 3.1.2

(to be finished)

Hotplug

Revision as of 07:32, 15 April 2010

mdadm versions < 3.1.2

mdadm versions > 3.1.2

Views

Personal tools

Navigation

Search

Tools

@@ Line 13: / Line 13: @@
   md1 : active raid1 sda2[0] sdb2[1]
         224612672 blocks [2/2] [UU]
+The md0 contains the system, md1 is for data (but is not used yet).
 If we hot-unplug the disk /dev/sda, the /proc/mdstat will show:
@@ Line 25: / Line 27: @@
 We see that sda1 has role '''[2]'''. Since RAID1 needs only 2 components - [0] and [1], the [2] means "Spare disk". And it is '''(F)'''ailed.
+But why the system thinks that sda2 is still OK?
+Because my system haven't tried to access /dev/md1 yet (I have no data on /dev/md1). The /dev/sda2 will be marked as fault automatically as soon as I try to access /dev/md1:
+ # dd if=/dev/md1 of=/dev/null bs=1 count=1
++0 records in
++0 records out
+byte (1 B) copied, 0.0184819 s, 0.1 kB/s
+ # cat /proc/mdstat
+ Personalities : [raid1]
+ md0 : active raid1 sda1'''[2](F)''' sdb1[1]
+       3903680 blocks [2/1] [_U]
+ md1 : active raid1 sda2'''[2](F)''' sdb2[1]
+       224612672 blocks [2/1] [_U]
 What we can do is remove the failed disk:
-  # mdadm /dev/md3 --fail detached --remove detached
+  # mdadm /dev/md0 --fail detached --remove detached
   mdadm: hot removed 8:1
 =mdadm versions > 3.1.2=
 (to be finished)