Detecting, querying and testing

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
m (Spam reversion)
(add navigation)
 
(12 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 +
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 +
|- padding:5px;padding-top:0.5em;font-size: 95%;
 +
| Back to [[RAID setup]] <span style="float:right; padding-left:5px;">Forward to [[Tweaking, tuning and troubleshooting]]</span>
 +
|}
 
=Detecting, querying and testing=
 
=Detecting, querying and testing=
  
Line 8: Line 12:
 
there could be some redundancy to keep your files alive, you must
 
there could be some redundancy to keep your files alive, you must
 
proceed with caution.
 
proceed with caution.
 
  
 
==Detecting a drive failure==
 
==Detecting a drive failure==
Line 14: Line 17:
 
Firstly: mdadm has an excellent 'monitor' mode which will send an email when a problem is detected in any array (more about that later).
 
Firstly: mdadm has an excellent 'monitor' mode which will send an email when a problem is detected in any array (more about that later).
  
Of course he standard log and stat files will record more details about a drive failure.
+
Of course the standard log and stat files will record more details about a drive failure.
  
 
It's always a must for /var/log/messages to fill screens with tons of
 
It's always a must for /var/log/messages to fill screens with tons of
Line 44: Line 47:
  
  
And, as expected, the classic /proc/mdstat look will also reveal problems,
+
And, as expected, the classic [[mdstat|/proc/mdstat]] look will also reveal problems,
  
 
     Personalities : [linear] [raid0] [raid1] [translucent]
 
     Personalities : [linear] [raid0] [raid1] [translucent]
Line 53: Line 56:
 
Later on this section we will learn how to monitor RAID with mdadm so
 
Later on this section we will learn how to monitor RAID with mdadm so
 
we can receive alert reports about disk failures. Now it's time to
 
we can receive alert reports about disk failures. Now it's time to
learn more about /proc/mdstat interpretation.
+
learn more about [[mdstat|/proc/mdstat]] interpretation.
  
==Querying the arrays status==
+
==Querying the array status==
  
You can always take a look at /proc/mdstat. It won't hurt. Let's learn
+
You can always take a look at the array status by doing '''cat /proc/mdstat'''
how to read the file. For example,
+
It won't hurt. Take a look at the [[mdstat|/proc/mdstat]] page to learn how to read the file.
  
                Personalities : [raid1]
+
Finally, remember that you can also use mdadm to check
                read_ahead 1024 sectors
+
                md5 : active raid1 sdb5[1] sda5[0]
+
                      4200896 blocks [2/2] [UU]
+
 
+
                md6 : active raid1 sdb6[1] sda6[0]
+
                      2104384 blocks [2/2] [UU]
+
 
+
                md7 : active raid1 sdb7[1] sda7[0]
+
                      2104384 blocks [2/2] [UU]
+
 
+
                md2 : active raid1 sdc7[1] sdd8[2] sde5[0]
+
                      1052160 blocks [2/2] [UU]
+
 
+
                unused devices: none
+
 
+
 
+
To identify the spare devices, first look for the [#/#] value on a
+
line.  The first number is the number of a complete raid device as
+
defined.  Lets say it is "n".  The raid role numbers [#] following
+
each device indicate its role, or function, within the raid set. Any
+
device with "n" or higher are spare disks. 0,1,..,n-1 are for the
+
working array.
+
 
+
Also, if you have a failure, the failed device will be marked with (F)
+
after the [#]. The spare that replaces this device will be the device
+
with the lowest role number n or higher that is not marked (F). Once
+
the resync operation is complete, the device's role numbers are
+
swapped.
+
 
+
The order in which the devices appear in the /proc/mdstat output means
+
nothing.
+
 
+
Finally, remember that you can always use raidtools or mdadm to check
+
 
the arrays out.
 
the arrays out.
  
 
           mdadm --detail /dev/mdx
 
           mdadm --detail /dev/mdx
          lsraid -a /dev/mdx
 
 
  
 
These commands will show spare and failed disks loud and clear.
 
These commands will show spare and failed disks loud and clear.
Line 112: Line 80:
 
know what will happen if a drive dies. It may electrically take the
 
know what will happen if a drive dies. It may electrically take the
 
bus it is attached to with it, rendering all drives on that bus
 
bus it is attached to with it, rendering all drives on that bus
inaccessible. I have never heard of that happening though, but it is
+
inaccessible. The drive may also just report a read/write fault
entirely possible. The drive may also just report a read/write fault
+
to the SCSI/IDE/SATA layer, which, if done properly, in turn makes the RAID layer handle this
to the SCSI/IDE layer, which in turn makes the RAID layer handle this
+
 
situation gracefully.  This is fortunately the way things often go.
 
situation gracefully.  This is fortunately the way things often go.
  
Remember, that you must be running RAID-{1,4,5} for your array to be
+
Remember, that you must be running RAID-{1,4,5,6,10} for your array to be
 
able to survive a disk failure.  Linear- or RAID-0 will fail
 
able to survive a disk failure.  Linear- or RAID-0 will fail
 
completely when a device is missing.
 
completely when a device is missing.
Line 124: Line 91:
  
 
If you want to simulate a drive failure, you can just plug out the
 
If you want to simulate a drive failure, you can just plug out the
drive.  You should do this with the power off. If you are interested
+
drive.  If your HW does not [[Hardware_issues#Hot_Swap|support disk hot-unplugging]], you should do this with the power off (if you are interested in testing whether your data can survive with a disk less than the usual number, there is no point in being a hot-plug cowboy here. Take the system down, unplug the disk, and boot it up again)
in testing whether your data can survive with a disk less than the
+
usual number, there is no point in being a hot-plug cowboy here. Take
+
the system down, unplug the disk, and boot it up again.
+
  
Look in the syslog, and look at /proc/mdstat to see how the RAID is
+
Look in the syslog, and look at [[mdstat|/proc/mdstat]] to see how the RAID is
doing. Did it work?
+
doing. Did it work? Did you get an email from the mdadm monitor?
  
 
Faulty disks should appear marked with an (F) if you look at
 
Faulty disks should appear marked with an (F) if you look at
/proc/mdstat.  Also, users of mdadm should see the device state as
+
[[mdstat|/proc/mdstat]].  Also, users of mdadm should see the device state as
 
faulty.
 
faulty.
  
 
When you've re-connected the disk again (with the power off, of
 
When you've re-connected the disk again (with the power off, of
 
course, remember), you can add the "new" device to the RAID again,
 
course, remember), you can add the "new" device to the RAID again,
with the raidhotadd command.
+
with the ''mdadm --add''' command.
 
+
  
 
===Force-fail by software===
 
===Force-fail by software===
  
Newer versions of raidtools come with a raidsetfaulty command.  By
+
You can just simulate a drive failure without unplugging things.
using raidsetfaulty you can just simulate a drive failure without
+
unplugging things off.
+
 
+
 
Just running the command
 
Just running the command
  
       raidsetfaulty /dev/md1 /dev/sdc2
+
       mdadm --manage --set-faulty /dev/md1 /dev/sdc2
  
 
+
should be enough to fail the disk /dev/sdc2 of the array /dev/md1.
should be enough to fail the disk /dev/sdc2 of the array /dev/md1. If
+
you are using mdadm, just type
+
 
+
      mdadm --manage --set-faulty /dev/md1 /dev/sdc2
+
  
  
Line 166: Line 122:
  
  
Checking /proc/mdstat out will show the degraded array. If there was a
+
Checking [[mdstat|/proc/mdstat]] out will show the degraded array. If there was a
 
spare disk available, reconstruction should have started.
 
spare disk available, reconstruction should have started.
  
Another fresh utility in newest raidtools is lsraid. Try with
+
Another useful command at this point is:
 
+
      lsraid -a /dev/md1
+
 
+
 
+
users of mdadm can run the command
+
  
 
       mdadm --detail /dev/md1
 
       mdadm --detail /dev/md1
  
 
+
Enjoy the view.
and enjoy the view.
+
  
 
Now you've seen how it goes when a device fails. Let's fix things up.
 
Now you've seen how it goes when a device fails. Let's fix things up.
  
 
First, we will remove the failed disk from the array. Run the command
 
First, we will remove the failed disk from the array. Run the command
 
 
      raidhotremove /dev/md1 /dev/sdc2
 
 
 
users of mdadm can run the command
 
  
 
       mdadm /dev/md1 -r /dev/sdc2
 
       mdadm /dev/md1 -r /dev/sdc2
  
 
+
Note that mdadm cannot pull a disk out of a running array.
Note that raidhotremove cannot pull a disk out of a running array.
+
For obvious reasons, only faulty disks can be hot-removed from an
For obvious reasons, only crashed disks are to be hotremoved from an
+
array (even stopping and unmounting the device won't help - if you ever want
array (running raidstop and unmounting the device won't help).
+
to remove a 'good' disk, you have to tell the array to put it into the
 +
'failed' state as above).
  
 
Now we have a /dev/md1 which has just lost a device. This could be a
 
Now we have a /dev/md1 which has just lost a device. This could be a
Line 204: Line 149:
  
 
So the trip ends when we send /dev/sdc2 back home.
 
So the trip ends when we send /dev/sdc2 back home.
 
      raidhotadd /dev/md1 /dev/sdc2
 
 
 
As usual, you can use mdadm instead of raidtools. This should be the
 
command
 
  
 
       mdadm /dev/md1 -a /dev/sdc2
 
       mdadm /dev/md1 -a /dev/sdc2
Line 216: Line 155:
 
As the prodigal son returns to the array, we'll see it becoming an
 
As the prodigal son returns to the array, we'll see it becoming an
 
active member of /dev/md1 if necessary. If not, it will be marked as
 
active member of /dev/md1 if necessary. If not, it will be marked as
an spare disk.  That's management made easy.
+
a spare disk.  That's management made easy.
 
+
 
+
  
 
==Simulating data corruption==
 
==Simulating data corruption==
  
RAID (be it hardware- or software-), assumes that if a write to a disk
+
RAID (be it hardware or software), assumes that if a write to a disk
 
doesn't return an error, then the write was successful. Therefore, if
 
doesn't return an error, then the write was successful. Therefore, if
 
your disk corrupts data without returning an error, your data will
 
your disk corrupts data without returning an error, your data will
Line 228: Line 165:
 
is possible, and it would result in a corrupt filesystem.
 
is possible, and it would result in a corrupt filesystem.
  
RAID cannot and is not supposed to guard against data corruption on
+
RAID cannot, and is not supposed to, guard against data corruption on
 
the media. Therefore, it doesn't make any sense either, to purposely
 
the media. Therefore, it doesn't make any sense either, to purposely
 
corrupt data (using dd for example) on a disk to see how the RAID
 
corrupt data (using dd for example) on a disk to see how the RAID
Line 238: Line 175:
 
for data integrity, it just allows you to keep your data if a disk
 
for data integrity, it just allows you to keep your data if a disk
 
dies (that is, with RAID levels above or equal one, of course).
 
dies (that is, with RAID levels above or equal one, of course).
 
  
 
==Monitoring RAID arrays==
 
==Monitoring RAID arrays==
Line 251: Line 187:
 
Let's see a basic example.  Running
 
Let's see a basic example.  Running
  
     mdadm --monitor --mail=root@localhost --delay=1800 /dev/md2
+
     mdadm --monitor --daemonise --mail=root@localhost --delay=1800 /dev/md2
  
 
+
should release a mdadm daemon to monitor /dev/md2. The --daemonise switch tells mdadm to run as a deamon. The delay parameter means that polling will be done in intervals of 1800 seconds.
should release a mdadm daemon to monitor /dev/md2. The delay parame-
+
ter means that polling will be done in intervals of 1800 seconds.
+
 
Finally, critical events and fatal errors should be e-mailed to the
 
Finally, critical events and fatal errors should be e-mailed to the
 
system manager. That's RAID monitoring made easy.
 
system manager. That's RAID monitoring made easy.
Line 262: Line 196:
 
run whenever an event is detected.
 
run whenever an event is detected.
  
Note that the mdadm daemon will never exit once it decides that there
+
Note that, when supplying the -f switch,  the mdadm daemon will never exit once it decides that there
 
are arrays to monitor, so it should normally be run in the background.
 
are arrays to monitor, so it should normally be run in the background.
 
Remember that your are running a daemon, not a shell command.
 
Remember that your are running a daemon, not a shell command.
 +
If mdadm is ran to monitor without the -f switch, it will behave as a normal shell command and wait for you to stop it.
  
 
Using mdadm to monitor a RAID array is simple and effective. However,
 
Using mdadm to monitor a RAID array is simple and effective. However,
Line 270: Line 205:
 
happens, for example, if the mdadm daemon stops?  In order to overcome
 
happens, for example, if the mdadm daemon stops?  In order to overcome
 
this problem, one should look towards "real" monitoring solutions.
 
this problem, one should look towards "real" monitoring solutions.
There is a number of free software, open source, and commercial
+
There are a number of free software, open source, and even commercial
 
solutions available which can be used for Software RAID monitoring on
 
solutions available which can be used for Software RAID monitoring on
 
Linux. A search on FreshMeat should return a good number of matches.
 
Linux. A search on FreshMeat should return a good number of matches.
 +
 +
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 +
|- padding:5px;padding-top:0.5em;font-size: 95%;
 +
| Back to [[RAID setup]] <span style="float:right; padding-left:5px;">Forward to [[Tweaking, tuning and troubleshooting]]</span>
 +
|}

Latest revision as of 11:17, 4 April 2011

Back to RAID setup Forward to Tweaking, tuning and troubleshooting

Contents

[edit] Detecting, querying and testing

This section is about life with a software RAID system, that's communicating with the arrays and tinkertoying them.

Note that when it comes to md devices manipulation, you should always remember that you are working with entire filesystems. So, although there could be some redundancy to keep your files alive, you must proceed with caution.

[edit] Detecting a drive failure

Firstly: mdadm has an excellent 'monitor' mode which will send an email when a problem is detected in any array (more about that later).

Of course the standard log and stat files will record more details about a drive failure.

It's always a must for /var/log/messages to fill screens with tons of error messages, no matter what happened. But, when it's about a disk crash, huge lots of kernel errors are reported. Some nasty examples, for the masochists,

    kernel: scsi0 channel 0 : resetting for second half of retries.
    kernel: SCSI bus is being reset for host 0 channel 0.
    kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0
    kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed
    kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00
    kernel: scsi0: Aborting CCB #2669 to Target 0
    kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder
    kernel: SCSI bus is being reset for host 0 channel 0.
    kernel: scsi0: CCB #2669 to Target 0 Aborted
    kernel: scsi0: Resetting BusLogic BT-958 due to Target 0
    kernel: scsi0: *** BusLogic BT-958 Initialized Successfully ***

Most often, disk failures look like these,

    kernel: sidisk I/O error: dev 08:01, sector 1590410
    kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002

or these

    kernel: hde: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0
    kernel: hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }


And, as expected, the classic /proc/mdstat look will also reveal problems,

    Personalities : [linear] [raid0] [raid1] [translucent]
    read_ahead not set
    md7 : active raid1 sdc9[0] sdd5[8] 32000 blocks [2/1] [U_]


Later on this section we will learn how to monitor RAID with mdadm so we can receive alert reports about disk failures. Now it's time to learn more about /proc/mdstat interpretation.

[edit] Querying the array status

You can always take a look at the array status by doing cat /proc/mdstat It won't hurt. Take a look at the /proc/mdstat page to learn how to read the file.

Finally, remember that you can also use mdadm to check the arrays out.

         mdadm --detail /dev/mdx

These commands will show spare and failed disks loud and clear.

[edit] Simulating a drive failure

If you plan to use RAID to get fault-tolerance, you may also want to test your setup, to see if it really works. Now, how does one simulate a disk failure?

The short story is, that you can't, except perhaps for putting a fire axe thru the drive you want to "simulate" the fault on. You can never know what will happen if a drive dies. It may electrically take the bus it is attached to with it, rendering all drives on that bus inaccessible. The drive may also just report a read/write fault to the SCSI/IDE/SATA layer, which, if done properly, in turn makes the RAID layer handle this situation gracefully. This is fortunately the way things often go.

Remember, that you must be running RAID-{1,4,5,6,10} for your array to be able to survive a disk failure. Linear- or RAID-0 will fail completely when a device is missing.

[edit] Force-fail by hardware

If you want to simulate a drive failure, you can just plug out the drive. If your HW does not support disk hot-unplugging, you should do this with the power off (if you are interested in testing whether your data can survive with a disk less than the usual number, there is no point in being a hot-plug cowboy here. Take the system down, unplug the disk, and boot it up again)

Look in the syslog, and look at /proc/mdstat to see how the RAID is doing. Did it work? Did you get an email from the mdadm monitor?

Faulty disks should appear marked with an (F) if you look at /proc/mdstat. Also, users of mdadm should see the device state as faulty.

When you've re-connected the disk again (with the power off, of course, remember), you can add the "new" device to the RAID again, with the mdadm --add' command.

[edit] Force-fail by software

You can just simulate a drive failure without unplugging things. Just running the command

     mdadm --manage --set-faulty /dev/md1 /dev/sdc2

should be enough to fail the disk /dev/sdc2 of the array /dev/md1.


Now things move up and fun appears. First, you should see something like the first line of this on your system's log. Something like the second line will appear if you have spare disks configured.

     kernel: raid1: Disk failure on sdc2, disabling device.
     kernel: md1: resyncing spare disk sdb7 to replace failed disk


Checking /proc/mdstat out will show the degraded array. If there was a spare disk available, reconstruction should have started.

Another useful command at this point is:

     mdadm --detail /dev/md1

Enjoy the view.

Now you've seen how it goes when a device fails. Let's fix things up.

First, we will remove the failed disk from the array. Run the command

     mdadm /dev/md1 -r /dev/sdc2

Note that mdadm cannot pull a disk out of a running array. For obvious reasons, only faulty disks can be hot-removed from an array (even stopping and unmounting the device won't help - if you ever want to remove a 'good' disk, you have to tell the array to put it into the 'failed' state as above).

Now we have a /dev/md1 which has just lost a device. This could be a degraded RAID or perhaps a system in the middle of a reconstruction process. We wait until recovery ends before setting things back to normal.

So the trip ends when we send /dev/sdc2 back home.

     mdadm /dev/md1 -a /dev/sdc2


As the prodigal son returns to the array, we'll see it becoming an active member of /dev/md1 if necessary. If not, it will be marked as a spare disk. That's management made easy.

[edit] Simulating data corruption

RAID (be it hardware or software), assumes that if a write to a disk doesn't return an error, then the write was successful. Therefore, if your disk corrupts data without returning an error, your data will become corrupted. This is of course very unlikely to happen, but it is possible, and it would result in a corrupt filesystem.

RAID cannot, and is not supposed to, guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that. It is most likely (unless you corrupt the RAID superblock) that the RAID layer will never find out about the corruption, but your filesystem on the RAID device will be corrupted.

This is the way things are supposed to work. RAID is not a guarantee for data integrity, it just allows you to keep your data if a disk dies (that is, with RAID levels above or equal one, of course).

[edit] Monitoring RAID arrays

You can run mdadm as a daemon by using the follow-monitor mode. If needed, that will make mdadm send email alerts to the system administrator when arrays encounter errors or fail. Also, follow mode can be used to trigger contingency commands if a disk fails, like giving a second chance to a failed disk by removing and reinserting it, so a non-fatal failure could be automatically solved.

Let's see a basic example. Running

    mdadm --monitor --daemonise --mail=root@localhost --delay=1800 /dev/md2

should release a mdadm daemon to monitor /dev/md2. The --daemonise switch tells mdadm to run as a deamon. The delay parameter means that polling will be done in intervals of 1800 seconds. Finally, critical events and fatal errors should be e-mailed to the system manager. That's RAID monitoring made easy.

Finally, the --program or --alert parameters specify the program to be run whenever an event is detected.

Note that, when supplying the -f switch, the mdadm daemon will never exit once it decides that there are arrays to monitor, so it should normally be run in the background. Remember that your are running a daemon, not a shell command. If mdadm is ran to monitor without the -f switch, it will behave as a normal shell command and wait for you to stop it.

Using mdadm to monitor a RAID array is simple and effective. However, there are fundamental problems with that kind of monitoring - what happens, for example, if the mdadm daemon stops? In order to overcome this problem, one should look towards "real" monitoring solutions. There are a number of free software, open source, and even commercial solutions available which can be used for Software RAID monitoring on Linux. A search on FreshMeat should return a good number of matches.

Back to RAID setup Forward to Tweaking, tuning and troubleshooting
Personal tools