Timeout Mismatch

From Linux Raid Wiki
Jump to: navigation, search
Back to Asking for help Forward to What's all this with USB?

Contents

Introduction

When the OS tries to read from the disk, it sends the command and waits. What should happen is that drive returns the data successfully.

The proper sequence of events when something goes wrong is that the drive can't read the data, it "tries harder" for a few seconds and then gives up (timeout) and returns an error to the OS. The raid code then calculates what the data should be, and writes it back to the disk. Glitches like this are normal and, provided the disk isn't failing, this will correct the problem. If a read operation on disk fails, it doesn't mean that a write operation on those same sectors will fail.

Hard Drive timeout vs Kernel timeout

The hard drive timeout is the time it takes for the hard drive to give up and return an error to the OS when it can't read the data. This timeout is drive dependent. The default value is rarely documented! Some drives let you configure the timeout, others don't. This feature to configure the hard drive timeout is called "Error recovery control" (ERC), although different brands call this differently. Western Digital calls this feature "time-limited error recovery" (TLER) and Samsung/Hitachi "command completion time limit" (CCTL).

Most cheap modern desktop drives do not support error recovery control. You should check the exact model documentation or find someone with that drive and use the linux smartctl command to find out.

To check whether the drive has ERC, look at the output you got from smartctl, and see whether SCT Error Recovery Control is supported. You can use the command:

smartctl -l scterc /dev/sdx


If ERC isn't supported, you are stuck with the default hard drive timeout. Unfortunately, with desktop drives, they can take over two minutes to give up (except for SMR drives, which is way worse, as we will see later), while the linux kernel will give up after 30 seconds. When the kernel gives up, the RAID code recomputes the block and tries to write it back to the disk. The disk is still trying to read the data and fails to respond, so the raid code assumes the drive is dead and kicks it from the array. This is how a single error with these drives can easily kill an array.

The way to handle this is to check if the drives support ERC and set the drive timeout to 7 seconds. For those drives that do not support ERC, the best is to replace those drives with ones that support ERC, but if there is no budget, change the kernel timeout and increase it to 3 minutes. This way, in case of read error, the kernel timeout will be bigger than the drive timeout, and you will avoid the situation where a functioning drive gets kicked from the array.

The following script was posted to the mailing list by Brad Campbell. Make sure it runs on every boot - the cheaper drives especially forget any settings you may make when the system is shut down. It will increase the timeout for all non-ERC drives. It also sets the timeout for ERC drives as many older desktop drives that do support it have inappropriate settings.


#!/bin/bash
for i in /dev/sd? ; do
    if smartctl -l scterc,70,70 $i > /dev/null ; then
        echo -n $i " is good "
    else
        echo 180 > /sys/block/${i/\/dev\/}/device/timeout
        echo -n $i " is  bad "
    fi;
    smartctl -i $i | egrep "(Device Model|Product:)"
    blockdev --setra 1024 $i
done 

WARNING: This does not work for all drives, although it seems to be older 2010-era ones (and post 2019 with SMR) that fail. The smartctl command attempts to set the ERC timeout to 7 seconds. This should either succeed and return 0, or fail and return an error code. Unfortunately, for drives that do not support SCT at all, the attempt to set ERC fails but returns 0, fooling the script. Whenever you get a new drive, you should make sure it behaves as expected.

Shingled Magnetic Recording (SMR): the new pandemic

In 2019, a new technology called shingled magnetic recording (SMR) started becoming mainstream. Whereas drive usage limits on conventional drives are advisory, burst limits especially on SMR drives are mandatory, and interfere with raid operation. While all manufacturers have been quietly introducing SMR on their desktop lines, WD unfortunately also introduced it on their "suitable for NAS/RAID" WD Red drives. Unfortunately, combining SMR and RAID is not a good idea, with many reports of new WD Reds simply refusing to be added to an existing array.

While conventional desktop drives (which use conventional magnetic recording or CMR) may take up to two minutes to give up reading, SMR drives are even worse - there are reports of them stalling for over 10 minutes as the drive shuffles everything around to make space.

For SMR drives, the drive should report that the trim command is supported. Unfortunately, some (many?) cheaper SMR drives do not, and due to the nature of SMR drives that don't support trim will have problems, leading to exactly the grief many have reported - the drive stalling for ever almost as it has to rewrite masses of data. Note, however, that SMR drives come in at least three types - DM (device managed) which may or may not support trimming, and HM (host managed) which shouldn't be a problem as they leave it to the computer to sort out.

69     14          1   Deterministic data after trim supported
69      5          1   Trimmed LBA range(s) returning zeroed data supported

Unfortunately, some drives do not. The reason for this may be down to the ATA specification. If I have my facts right, version 4 of the ATA specification postdates these problematic drives. but is required for reporting these capabilities. The drives in question may stick to the v3 specification rather than the provisional v4 spec.

ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

Unfortunately, it now seems that if you want to run an array, you can NOT use cheap 2020 or later drives. For the current state of affairs (mid 2020) WD has said that all the WD Red line is now SMR. To get raid-suitable CMR you need to buy Red Plus, or Red Pro. You should never have been using Seagate Barracudas anyway, but these have now pretty much all moved over to SMR (and been renamed BarraCuda). Seagate have said that their IronWolf and IronWolf Pro lines will remain CMR, and the FireCuda line seems all CMR at the moment (I guess these will be a bit like the Red Pros, the CMR equivalent of the BarraCuda).

The script from the previous section to change the timeout does nothing for SMR drives.

Conclusion, avoid using SMR drives for RAID.

Reading list

The following links-to-email have been collected by Phil Turmel as background reading to the problem. Read the entire threads if you have time.

http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2

External links about SMR drives

Back to Asking for help Forward to What's all this with USB?
Personal tools