According to a survey in the UK, the top five causes of data loss are (in order):

  • Hardware Failure

  • Human Error

  • Software Corruption

  • Malware (Viruses/Ransomware/etc)

  • Natural Disaster

Causes_of_data_loss.png


Since Hardware Failure is number one on this list, we’ll take a closer look at the root-causes and how they can be greatly reduced.

Losing Data to failed hardware can happen at any point in the chain from the drive to your computer reading it.  Disk drive manufacturers go to a lot of effort to ensure their drives are reliable, but anything with moving parts is inherently more likely to fail than a solid state circuit board.

HDD Manufacturers claim 100K to 1M hours (11 to 110 years), the reality seems to be more like 5 years (see Backblaze study) with a early peak for new drives and then a longer peak after 4 years as moving parts wear and lubricant seeps out and separates.  

Finding similar numbers for system boards and drive controllers is more difficult; however, I did find a claim by Gigabyte that their motherboards could expect 100k hours in normal conditions which seems reasonable in my experience.  

I’ve observed a number of system board failures in my career.  The vast majority were running in poor conditions -- especially laptops which see a lot more dust and heat than desktop PC’s.  The few board failures in clean (clean power and low dust) data centers were mostly very old and due to be replaced.

Assuming you have a system with a single hard-drive, your risks look like this:

annual_data_loss-hw_fail.png

It starts to look a little scary around year 5, but the accumulated risk over time is even more of an eye-opener.

accumulated_data_loss-hw_fail.png

Significantly, those are the odds with a single drive.  Next, we’ll look are how the odds change when we replace a single PC drive with a Mirror or RAID5 (both can recover from a single drive loss) system and RAID6 (can recover from two failed drives).  

annual_data_loss-drive_fail_v_raid.png

Again, we’ll show you the risk for each year and then the accumulated risk across ten years.

accumulated_data_loss-drive_fail_v_raid.png