SMART
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a technology that most hdd and ssd have, which enables them to do a self-check and report whether they are in good shape or might be failing.
Smart is found in the package smartmontools on most systems.
Check if a device can run SMART
smartctl --info {device}- Look for "SMART support is" line
Check which tests can be run on the device:
smartctl --capabilities {device}(or-c)- look for the "{test name} self-test routine" lines: the approximate duration of the test is shown alongside.
Run a test
smartctl -t {testname: short or long or conveyance} {device}- Wait for the amount of time indicated. The command will return immediately, but the test will take time.
smartctl -H {device}to see test results
If the test is not passed, you should immediately find another hard drive and backup your data, since the hdd is probably going to fail soon.
is this disk dying? question on stackexchange.
Badblocks
badblocks is a program that scan a partition, performing reading (and optionally writing) tests to determine whether any sector is faulty.
If the partition to scan hosts an ext2/3/4 filesystem, it's better to use fsck, since it will call badblocks with the correct block size that the filesystem is using.
badblocks -vs -o {output_file} {device}: will perform a read test on all the disk (tries to read every single sector), and store the list of bad sector in the output file. It is possible to specify options to perform a different test:-n: perform a non-destructive read-write test: for every block stores the original value and then tries to write and read to the block, checking for differences. The original value is restored afterwards. It is always suggested to perform a backup before doing this test.-w: perform a destructive write test: write values on the block and check that the values can be read. The default is to write five patterns for each block, instead patterns can be specified with the-toption (see manual). This test will wipe everything from the disk
The output file is a list of bad blocks on the disk. It can be used with the -l option of e2fsck or mke2fs to tell an ext2/3/4 filesystem not to use these blocks (see their manuals).
e2fsck
If you want to check a partition hosting a ext2/3/4 filesystem, it is suggested to use e2fsck. It will call badblocks with the correct options. Faulty blocks found this way are automatically added to a list of faulty blocks not to be used by the filesystem.
This will modify the filesystem, potentially corrupting files whose contents are in bad blocks, but these files are technically already corrupted, since they are in faulty blocks.
e2fsck -cDfty -C 0 {partition}- Add a second
-cto perform a non-destructive read-write test. It is always advised to do a backup before: the test should be non destructive (original value is read, block is tested for writing, original value is written back), but you never know.
- Add a second
List bad blocks and their contents
- run bad_blocks_files.sh. Script found on stackexchange.
As information, to print all the registered bad blocks of an ext drive, run dumpe2fs -b {partition}