Revision as of 19:14, 30 November 2019
dm-integrity has been around a while - version 1.0 was released with kernel 4.12 - but it is written and maintained by the crypto people as part of LUKS, so has a bunch of issues when used with other features such as raid. The good news is that this is simply because it hasn't been used and tested outside of LUKS, and any issues found will be treated as bugs and fixed. The bad news is that bug fixes always take time and effort.
To use dm-integrity, the kernel option CONFIG_DM_INTEGRITY in Device Drivers/Multi-device support (RAID and LVM) must be enabled. This will automatically enable CONFIG_BLK_DEV_INTEGRITY.
You will also need the device mapper library (part of the LVM2 package) and integritysetup (part of the cryptsetup package).
Run integritysetup to set up a partition with integrity support. By default data and metadata are stored on the same partition, but it looks as if it is possible to store the metadata on a separate partition, so you can enable integrity on an existing partition.
This makes it look as if adding integrity does not change the partition id, but I don't know.
Using dm-integrity in an array
dm-integrity should be enabled as a matter of course with any raid array, but at present this would be a little risky. There are issues with it at present because it's all new, but getting it to work should be a matter of urgency. At present, because raid is intended to protect against disk failure, without extra help it cannot detect and protect against corruption.
If data is corrupted on read, dm-integrity returns a read error. At present, I don't know what exact error code it returns, or whether raid is equipped to handle it, but when it all works, this will cause raid's error-handling code to kick in and return the correct data.
Using dm-integrity to recover an array
At the end of the day, if you lose too many drives you've lost it, but if you simply have a bunch of drives kicked as a result of unrecoverable read errors, you might be able to recover the array if you can salvage the rest of the data. It is intended that you can configure it such that "read before write" will fail - it should return an error of EILSEQ for an integrity failure.
The new device needs to be set up by integritysetup with the --no-wipe option. This should leave the drive unreadable. A ddrescue copy will validate all the data that ddrescue salvages, but anything that won't copy will be left unreadable on the new drive. Obviously, if the array is still readable, dropping the new drive in will work fine.
If, however, multiple drives have failed, you need to copy all drives, and hope that no individual stripe has suffered multiple failures. Once all the drives have been copied, an integrity check needs to be run to ensure each stripe is read in its entirety and all damaged sectors are re-calculated and rewritten.