https://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&feed=atom&action=historyMonitoring your system - Revision history2024-03-29T02:09:57ZRevision history for this page on the wikiMediaWiki 1.19.24https://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5865&oldid=prevAnthony Youngman: /* smartctl */2017-10-12T23:46:53Z<p><span dir="auto"><span class="autocomment">smartctl</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 23:46, 12 October 2017</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 28:</td>
<td colspan="2" class="diff-lineno">Line 28:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Don't rely on this! Check regularly on a manual basis!</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Don't rely on this! Check regularly on a manual basis!</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="background: #ffa; color:black; font-size: smaller;"><div>== smartctl ==</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">=</ins>== smartctl ==<ins class="diffchange diffchange-inline">=</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">This tool tells you all sorts of information about your drives. When you read the "When things go wrogn" section, you will see that smartctl is a very important diagnostic tool, but it also provides a lot of proactive information to help you anticipate a drive failure.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">There are various S.M.A.R.T. stats that can be looked at which will provide clues:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> Attribute | Description                    |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> SMART 5  | Reallocated Sectors Count      |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> SMART 187 | Reported Uncorrectable Errors  |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> SMART 188 | Command Timeout                |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> SMART 197 | Current Pending Sector Timeout |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> SMART 198 | Uncorrectable Sector Count    |</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Backblaze.com (who run huge raid arrays) have a lot of interesting information on their site. They point out that maybe a quarter of their drives fail when all these statistics are 0, so a healthy SMART report does not necessarily mean a healthy drive, but almost none of their drives survive having errors on all five counts.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">smartctl also reports on things like drive temperature, how long the drive has been powered on, how many times it has been started and shut down etc. It's no surprise that drives that get too hot or are otherwise stressed beyond normal limits tend to fail early.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td></tr>
</table>Anthony Youngmanhttps://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5863&oldid=prevAnthony Youngman: Add smartctl section2017-10-12T23:18:21Z<p>Add smartctl section</p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 23:18, 12 October 2017</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 27:</td>
<td colspan="2" class="diff-lineno">Line 27:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Don't rely on this! Check regularly on a manual basis!</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Don't rely on this! Check regularly on a manual basis!</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">== smartctl ==</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td></tr>
</table>Anthony Youngmanhttps://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5671&oldid=prevAnthony Youngman: update backwards2017-01-07T18:08:11Z<p>update backwards</p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:08, 7 January 2017</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|- padding:5px;padding-top:0.5em;font-size: 95%;  </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|- padding:5px;padding-top:0.5em;font-size: 95%;  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="background: #ffa; color:black; font-size: smaller;"><div>| Back to [[<del class="diffchange diffchange-inline">A guide to mdadm</del>]]  </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>| Back to [[<ins class="diffchange diffchange-inline">Scrubbing the drives</ins>]]  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|}</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Many of the horror stories that come to the linux raid mailing list are down to a simple lack of monitoring. Okay, it's not unknown for several disks to fail simultaneously, and if your raid array consists of a bunch of drives all bought at the same time, for the array, the odds of that happening are painfully high - batches of disks tend to have similar lifetimes.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Many of the horror stories that come to the linux raid mailing list are down to a simple lack of monitoring. Okay, it's not unknown for several disks to fail simultaneously, and if your raid array consists of a bunch of drives all bought at the same time, for the array, the odds of that happening are painfully high - batches of disks tend to have similar lifetimes.</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 31:</td>
<td colspan="2" class="diff-lineno">Line 31:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|- padding:5px;padding-top:0.5em;font-size: 95%;  </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|- padding:5px;padding-top:0.5em;font-size: 95%;  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="background: #ffa; color:black; font-size: smaller;"><div>| Back to [[<del class="diffchange diffchange-inline">A guide to mdadm</del>]]  </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>| Back to [[<ins class="diffchange diffchange-inline">Scrubbing the drives</ins>]]  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>|}</div></td></tr>
</table>Anthony Youngmanhttps://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5573&oldid=prevAnthony Youngman at 12:54, 17 November 20162016-11-17T12:54:40Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 12:54, 17 November 2016</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">|- padding:5px;padding-top:0.5em;font-size: 95%; </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">| Back to [[A guide to mdadm]] </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">|}</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Many of the horror stories that come to the linux raid mailing list are down to a simple lack of monitoring. Okay, it's not unknown for several disks to fail simultaneously, and if your raid array consists of a bunch of drives all bought at the same time, for the array, the odds of that happening are painfully high - batches of disks tend to have similar lifetimes.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Many of the horror stories that come to the linux raid mailing list are down to a simple lack of monitoring. Okay, it's not unknown for several disks to fail simultaneously, and if your raid array consists of a bunch of drives all bought at the same time, for the array, the odds of that happening are painfully high - batches of disks tend to have similar lifetimes.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 25:</td>
<td colspan="2" class="diff-lineno">Line 29:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{| style="border:1px solid #aaaaaa; background-color:#f9f9f9;width:100%; font-family: Verdana, sans-serif;"</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">|- padding:5px;padding-top:0.5em;font-size: 95%; </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">| Back to [[A guide to mdadm]] </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">|}</ins></div></td></tr>
</table>Anthony Youngmanhttps://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5547&oldid=prevAnthony Youngman: /* mdadm */2016-11-06T18:33:12Z<p><span dir="auto"><span class="autocomment">mdadm</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:33, 6 November 2016</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 14:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== mdadm ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== mdadm ===</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"> mdadm --monitor --scan --mail a@b.co.uk</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">This will fire up mdadm to keep an eye on your arrays. It will daemonize and run in the background, sending an email</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">to the specified address if it detects any problems related to a disk failure. This is good for remote monitoring BUT.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">It won't tell you if anything goes wrong with the monitoring! You cannot assume - even if you put this in your boot-up</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">sequence as you should - that you will be notified about important events. It's not unknown for the daemon to fail.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Don't rely on this! Check regularly on a manual basis!</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Analysing a Disk Failure ==</div></td></tr>
</table>Anthony Youngmanhttps://raid.wiki.kernel.org/index.php?title=Monitoring_your_system&diff=5543&oldid=prevAnthony Youngman: Initial creation (wip)2016-10-16T18:41:22Z<p>Initial creation (wip)</p>
<p><b>New page</b></p><div>Many of the horror stories that come to the linux raid mailing list are down to a simple lack of monitoring. Okay, it's not unknown for several disks to fail simultaneously, and if your raid array consists of a bunch of drives all bought at the same time, for the array, the odds of that happening are painfully high - batches of disks tend to have similar lifetimes.<br />
<br />
But all too often, an array has been running in a degraded state for months, and then a disk fails and tips the array over the edge. The author's brother told him of a raid array, bought and placed in a colo facility, where a technician just happened to walk past and spot two red lights! The raid 6 array had two failed drives! This should never have happened, and of course there was a mad panic while they tried to safely replace the dead drives.<br />
<br />
== Monitoring Tools ==<br />
<br />
=== /proc/mdstat ===<br />
<br />
You should get to know /proc/mdstat, looking at it often. This will tell you the state of your arrays, and very importantly it will tell you whether any drives have failed, and whether any arrays are degraded. Check, and check regularly!<br />
<br />
=== xosview ===<br />
<br />
xosview is a venerable utility, and one of the author's favourites. It is capable of displaying the state of raid arrays, but unfortunately currently the code is broken - it reads mdstat, and doesn't understand the current output. It is currently (2016) being updated to read the status directly from /sys, and should hopefully soon be able to display raid status correctly. The author leaves xosview running permanently on his desktop to provide an overview of system performance.<br />
<br />
=== mdadm ===<br />
<br />
== Analysing a Disk Failure ==</div>Anthony Youngman