You may have been reading lately about the END OF THE WORLD… oh wait, sorry, just how RAID-5 is completely screwed over the next few months. Well, I am here to try and help shed some light on the situation, and hopefully, not spread some crap on it.
So, what are unrecoverable errors?
First, some clarification. These types of errors are seemingly referred to with 3 different names:
Bit Error Rate (BER)
Unrecoverable Bit Error (UBE)
Unrecoverable Read Error (URE)
However, these all refer to the same problem, in that over time, drives will simply fail to see data due to some kind of error. Doesn’t matter what the cause of this error is, most vendors don’t go into that kind of detail. Now, these error rates occur between 1 in 10^14 bits of data, up to 1 in 10^16 bits of data. So, what does that mean:
|Error Rate||how many bits is that?||how many bytes?||Real World numbers|
This means that a drive with an Error Rate of 10^14 bits, will see an error around once every 12.5TB read from it. If you have a 2TB drive, you write 2TB to it, and then you fully read that, just over 6 times, then you will run into one read error, theoretically speaking. If you have a 1TB drive, and blah blah, it will take over 12 times to run into a read error. The numbers go up by an order of magnitude (ahem, multiply the number by 10) for each 10^x improvement you make. A 1TB drive with an error rate of 10^15 bits, will take 125 times to run into a read error; an error rate of 10^16 bits will take 1,250 full drive reads to run into one error.
The RAID problem
The problem becomes a concern as you mix together two issues: 1) drives approaching 2TB each. 2) RAID arrays stringing lots of these drives together. If you have a RAID 5 array, using 8 drives, you will often see it setup with 7 data and parity drives, and one hot spare. If you have 2TB drives, then you will have a 12TB array. If one drive fails, and the system tries to recover the data onto the hot spare, then it will read 12TB worth of data (even the empty data) to create the new drive information. Now, you run into the possibility of having a read error. Mind you, RAID 5 normally doesn’t care about read errors, because it has redundant information needed to recover the missing bits (or bytes). However, during a recover operation, there is no redundant data.
At best, the RAID array will skip the missing data, which causes data loss. At worst, the array could fail the drive with the read error, causing the system to see two bad drives at once. Without other action, this would be the “all your data is belong to us,” or the better known as “pray you have a backup of either all of the data, or at least your resume” problem.
The Chicken Littles
The concern over the failure rates of drives have always been an issue. There is extensive anecdotal evidence concerning batches of drive failures, then of course there was the DeathStar drive failures of the IBM Deskstar 75GXP drives.
However, this problem of error rates, rather than full drive failures, has been under the radar for most sysadmins. There have been those speaking about it. Most recently, there are those hwo have been going for a more “shock” factor in their proclamations of drive and RAID failures. Mind you, they are not completely incorrect, however their wording makes it seem like RAID arrays (and RAID 5 in particular) are doomed.
There is this article in ComputerWorld by Jerome Wendt, talking about why “using SATA drives to store large amounts of de-duplicated data is not always the match made in heaven that vendors make them out to be.” I don’t disagree with his finding, but I don’t care for how ‘thin’ his treatment of the subject was. He did follow up on this site, but I become concerned when phrases like “no matter what the manufacturers say, their RAID systems are probably more prone to failure in the real world than even they realize or can document.” I feel that those phrases tend to let the vendors off the hook.
However, the real ‘the sky is falling’ article has to be RAID 5 stops in 2009 by Robin Harris. Robin goes as far as to lament about how “RAID 6 in a few years will give you no more protection than RAID 5 does today.” His argument is that error rates are steady, while capacities are doubling every year.
The Nay Sayers
While there are others out there, this is the one I found that really made me stop. Tom Treadway, from Adaptec, created a RAID reliability calculator. The basis for those calculations were from this article that he wrote.
I find it strange how someone can conclude that SATA with an error rate of 10^14 is 171 times better than SAS with an error rate of 10^15. OK, he did put the SAS in a RAID 5 array, and the SATA in a RAID 6, with it’s double parity striping, and higher failure tolerance.
The Real World
Now, there are some great, and not so great, articles out there which go into a lot of detail about this problem. First off, is the Hughes and Murray article from the ACM transactions on Storage, Vol 1, No 1, December 2004. This is very detailed, with lots of good information. However, it is a little dated. Since it describes SATA drives as having primarily 10^14 error rates, and FC drives with 10^15 error rate. Modern ‘Enterprise’ drives have 10^15 rates for SATA and 10^16 rates for FC.
Next up, is an article advertising SUN’s StorageTek 6140 RAID 6 Array by Said A Syed. While I do not doubt that RAID 6 is far better than RAID 5, I usually read any paper trying to sell a particular product as being biased to what the product does.
Finally, here is a file which has a listing of drives I could find on 3 vendors web sites, Hitachi, Seagate, and Western Digital. It lists the make, model, sizes the drives are in, the error rate, and the interface type(s). As you can see, the FC and SAS are higher in performance than SATA. Also, unlike Robin Harris’ statement that error rates are steady, these rates, when compared to the numbers in the Hughes and Murray article, shows that rates have improved by an order of magnitude (multiplied by 10) since 2004.
This also shows that even while SATA drives has increased in size by 4 to 5 times, the error rate has improved by 10 times. Now, the largest FC drives I can find are just a little over 10 times the size in 2004, while the error rate has improved by 10 times. This can be a concern if the sizes of the FC and SAS drives start catching up to the SATA sizes, without seeing drastic improvements in their error rates.
There is hope, not just for the future, but for now
I am not going to leave you out in the cold, with just the numbers to keep yourself warm with. Here are my recommendations, and since I am not selling anything, they are as unbiased as I can make them.
0. Where possible, switch to RAID 6, or if in a pinch, RAID 1+0.
RAID 6 uses two different parity bits to store data. This means that it can lose 2 drives, and still be usable. For those not running major storage arrays, but just using home setups, then RAID 1+0 (NOT 0+1) might be your best choice. It can lose up to 1/2 of it’s drives and still function, as long as there is at least one drive from each mirrored group.
1. Demand better SATA drives for your RAID.
Not all SATA drives are made alike. Keep this in mind when you are looking at storage systems. Ask your vendor which drives they use. The Seagate Barracude ES and ES.2, WD’s latest drives, and the Hitachi Ultrastar line are all 10^15 error rates.
2. Switch to FC or SAS drives.
While this is a more expensive option, it provides you with the best quality drives possible. Error rates or normally in the 10^16 range. All of the FC drive listed were in this error range, and the only SAS drives that were not in this range, were the models with both SATA and SAS drives.
You should be doing this already. Face it, even if it is to another drive array, this is your best option. While this always increases the cost and complexity of a system, it is always more valuable than the alternative.
Filed under: Tech