Easy Fixes

From Linux Raid Wiki
Jump to: navigation, search
Back to The Badblocks controversy Forward to Replacing a failed drive

This page is meant for a short description of all those little emergencies that will make an inexperienced sysadmin panic, where anybody who's met them before will go "shrug, this is a cynch".

Contents

Long Term Support Distros

Debian 9 and Ubuntu

If you're trying to reshape an array, and the reshape hangs or the array crashes, it's easy enough to fix but you'll need a recovery disk. Check your mdadm version - if it's 3.4 or similar then the problem is that you have an old mdadm and an old-but-updated kernel. Array administration is not regression tested, unsurprisingly. You need a matched mdadm and kernel so you should boot from a up-to-date rescue disk.

Stop the array, and try to restart the reshape. This will probably fail, so stop the array again and revert the reshape. You should now be able to restart the array, and then restart the reshape. If you're using a rescue disk, once the reshape is complete you should then be able to reboot back in to the old distro.

WARNING - Some tools - usually installation or upgrade tools - can go rogue and format disks without asking. Upgrading the distro is one instance where this appears to have happened and damaged the array! Beware of upgrading!

Changing Defaults

Raid-5 journals and bitmaps

As of October 2017, it's been disallowed to run raid-5 with both a journal and a bitmap. It's pointless to do so, and causes problems with race conditions in recovery. So if an upgrade makes your array stop running, the fix is to assemble the array with the option "--update=no-bitmap"

Raid-0 won't assemble after kernel upgrade

Thanks to a slight cock-up in kernel 5.13, the layout of an asymetric raid-0 has changed. Note that this only applies if you are combining disks of different sizes, so if all your disks are the same size this isn't your problem.

If you're running a post-5.13 kernel, you need to tell it whether the array was created pre-5.13 (type 1) or post-5.13 (type 2). Seeing as using the wrong type can (will) cause data corruption, that is why a post-5.13 kernel will refuse to assemble the array unless it knows what type it is. This is why a kernel upgrade causes this problem, because pre-5.13 arrays don't contain this information until the user adds it manually

Hardware Troubleshooting

Replacement Controller Cards

If you think your controller is faulty, and have replaced it with a spare you had lying around, make sure the controller and drives are compatible! Some SATA-II controllers cannot handle drives over 2TB.

Boot Problems

Centos Emergency Mode

Centos (and presumably other distros) has an emergency mode. Typically such modes suppress module loading, ignore fstab, and a whole host of other potentially problematic features.

Of course, this also will suppress a load of features you want like bringing up your raid array. Try manually doing all the features you expect boot to do for you.

Back to The Badblocks controversy Forward to Replacing a failed drive
Personal tools