Easy Fixes

From Linux Raid Wiki
(Difference between revisions)
Jump to: navigation, search
(Add new page Easy Fixes)
 
(Kernel 5.10)
 
(7 intermediate revisions by one user not shown)
Line 1: Line 1:
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 
|- padding:5px;padding-top:0.5em;font-size: 95%;  
 
|- padding:5px;padding-top:0.5em;font-size: 95%;  
| Back to [[Timeout Mismatch]] <span style="float:right; padding-left:5px;">Forward to [[Replacing a failed drive]]</span>
+
| Back to [[The Badblocks controversy]] <span style="float:right; padding-left:5px;">Forward to [[Replacing a failed drive]]</span>
 
|}
 
|}
  
 
This page is meant for a short description of all those little emergencies that will make an inexperienced sysadmin panic, where anybody who's met them before will go "shrug, this is a cynch".
 
This page is meant for a short description of all those little emergencies that will make an inexperienced sysadmin panic, where anybody who's met them before will go "shrug, this is a cynch".
 +
 +
== Long Term Support Distros ==
 +
 +
=== Debian 9 and Ubuntu ===
 +
 +
If you're trying to reshape an array, and the reshape hangs or the array crashes, it's easy enough to fix but you'll need a recovery disk. Check your mdadm version - if it's 3.4 or similar then the problem is that you have an old mdadm and an old-but-updated kernel. Array administration is not regression tested, unsurprisingly. You need a matched mdadm and kernel so you should boot from a up-to-date rescue disk.
 +
 +
Stop the array, and try to restart the reshape. This will probably fail, so stop the array again and revert the reshape. You should now be able to restart the array, and then restart the reshape. If you're using a rescue disk, once the reshape is complete you should then be able to reboot back in to the old distro.
 +
 +
October 2020 - this has apparently been debugged. It seems that there is a buggy (thrown together and not really tested) systemd unit. Given that most Ubuntu installations that old don't run systemd, that's why it's not that common. Hopefully the bug-fixed unit will soon be rolled out and this will be a thing of the past.
 +
 +
WARNING - Some tools - usually installation or upgrade tools - can go rogue and format disks without asking. Upgrading the distro is one instance where this appears to have happened and damaged the array! Beware of upgrading!
 +
 +
== Changing Defaults ==
 +
 +
=== Raid-5 journals and bitmaps ===
 +
 +
As of October 2017, it's been disallowed to run raid-5 with both a journal and a bitmap. It's pointless to do so, and causes problems with race conditions in recovery. So if an upgrade makes your array stop running, the fix is to assemble the array with the option "--update=no-bitmap"
 +
 +
=== Raid-0 won't assemble after kernel upgrade ===
 +
 +
Thanks to a slight cock-up in kernel 5.13, the layout of an asymetric raid-0 has changed. Note that this only applies if you are combining disks of different sizes, so if all your disks are the same size this isn't your problem.
 +
 +
If you're running a post-5.13 kernel, you need to tell it whether the array was created pre-5.13 (type 1) or post-5.13 (type 2). Seeing as using the wrong type can (will) cause data corruption, that is why a post-5.13 kernel will refuse to assemble the array unless it knows what type it is. This is why a kernel upgrade causes this problem, because pre-5.13 arrays don't contain this information until the user adds it manually
 +
 +
=== I just upgraded to kernel 5.10 ===
 +
 +
This kernel has a bug which shrinks (but does not damage) your array. It's been fixed in 5.10.1, and an upgrade is supposed to put things right.
 +
 +
However, there are various reports that that is not enough, so just run the command
 +
<pre>mdadm --grow --size <size> /dev/mdXXX</pre>
 +
If you're using the entire partition or disk, just specify size as "max".
  
 
== Hardware Troubleshooting ==
 
== Hardware Troubleshooting ==
  
=== Replacement Controller Cards ==
+
=== Replacement Controller Cards ===
  
 
If you think your controller is faulty, and have replaced it with a spare you had lying around, make sure the controller and drives are compatible! Some SATA-II controllers cannot handle drives over 2TB.
 
If you think your controller is faulty, and have replaced it with a spare you had lying around, make sure the controller and drives are compatible! Some SATA-II controllers cannot handle drives over 2TB.
Line 22: Line 54:
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 
{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"
 
|- padding:5px;padding-top:0.5em;font-size: 95%;  
 
|- padding:5px;padding-top:0.5em;font-size: 95%;  
| Back to [[Timeout Mismatch]] <span style="float:right; padding-left:5px;">Forward to [[Replacing a failed drive]]</span>
+
| Back to [[The Badblocks controversy]] <span style="float:right; padding-left:5px;">Forward to [[Replacing a failed drive]]</span>
 
|}
 
|}

Latest revision as of 23:43, 22 December 2020

Back to The Badblocks controversy Forward to Replacing a failed drive

This page is meant for a short description of all those little emergencies that will make an inexperienced sysadmin panic, where anybody who's met them before will go "shrug, this is a cynch".

Contents

[edit] Long Term Support Distros

[edit] Debian 9 and Ubuntu

If you're trying to reshape an array, and the reshape hangs or the array crashes, it's easy enough to fix but you'll need a recovery disk. Check your mdadm version - if it's 3.4 or similar then the problem is that you have an old mdadm and an old-but-updated kernel. Array administration is not regression tested, unsurprisingly. You need a matched mdadm and kernel so you should boot from a up-to-date rescue disk.

Stop the array, and try to restart the reshape. This will probably fail, so stop the array again and revert the reshape. You should now be able to restart the array, and then restart the reshape. If you're using a rescue disk, once the reshape is complete you should then be able to reboot back in to the old distro.

October 2020 - this has apparently been debugged. It seems that there is a buggy (thrown together and not really tested) systemd unit. Given that most Ubuntu installations that old don't run systemd, that's why it's not that common. Hopefully the bug-fixed unit will soon be rolled out and this will be a thing of the past.

WARNING - Some tools - usually installation or upgrade tools - can go rogue and format disks without asking. Upgrading the distro is one instance where this appears to have happened and damaged the array! Beware of upgrading!

[edit] Changing Defaults

[edit] Raid-5 journals and bitmaps

As of October 2017, it's been disallowed to run raid-5 with both a journal and a bitmap. It's pointless to do so, and causes problems with race conditions in recovery. So if an upgrade makes your array stop running, the fix is to assemble the array with the option "--update=no-bitmap"

[edit] Raid-0 won't assemble after kernel upgrade

Thanks to a slight cock-up in kernel 5.13, the layout of an asymetric raid-0 has changed. Note that this only applies if you are combining disks of different sizes, so if all your disks are the same size this isn't your problem.

If you're running a post-5.13 kernel, you need to tell it whether the array was created pre-5.13 (type 1) or post-5.13 (type 2). Seeing as using the wrong type can (will) cause data corruption, that is why a post-5.13 kernel will refuse to assemble the array unless it knows what type it is. This is why a kernel upgrade causes this problem, because pre-5.13 arrays don't contain this information until the user adds it manually

[edit] I just upgraded to kernel 5.10

This kernel has a bug which shrinks (but does not damage) your array. It's been fixed in 5.10.1, and an upgrade is supposed to put things right.

However, there are various reports that that is not enough, so just run the command

mdadm --grow --size <size> /dev/mdXXX

If you're using the entire partition or disk, just specify size as "max".

[edit] Hardware Troubleshooting

[edit] Replacement Controller Cards

If you think your controller is faulty, and have replaced it with a spare you had lying around, make sure the controller and drives are compatible! Some SATA-II controllers cannot handle drives over 2TB.

[edit] Boot Problems

[edit] Centos Emergency Mode

Centos (and presumably other distros) has an emergency mode. Typically such modes suppress module loading, ignore fstab, and a whole host of other potentially problematic features.

Of course, this also will suppress a load of features you want like bringing up your raid array. Try manually doing all the features you expect boot to do for you.

Back to The Badblocks controversy Forward to Replacing a failed drive
Personal tools