Storage Folks

Think Storage, Dream Storage, Talk Storage.

RAID Unleashed (Beauty of RAID)

Hi Folks,

I started my Storage Career with understanding RAID, it was really confusing to me when I started reading but over a period of time analyzing the logic and understanding the different RAID levels made me realize this is one of the beautiful things researchers at Univ of California, Berkeley ever produced. Without RAID forget all your great boxes of NetApp, EMC, Hitachi etc., can do, they are just useless.

Here I’m gonna explain the RAID in very basic terms which I think everyone will understand for sure. Why waiting….Lets dig into it. Its a bit lengthy and supposed to be tutorial rather than blog though.

What is RAID?

RAID is generally referred to as Redundant Array of Independent Disks. Basically a group of small harddisks combined to simulate a large harddisk(HDD) with advantages of better performance and data protection. For eg., if I want a 1000GB today what I will do is I’ll buy 10 100GB HDD’s and put a software on top of it which combines all this HDDs and project as a single HDD to the Server. And this software also increase the performance by doing striping and gives data protection by doing Mirroring/Parity generation. (Will explain Striping, Mirroring and Parity soon and how they increase performance/Data protection in next few paras). Some times this piece of Software is installed on Operating System and is referred as Software RAID and sometime this is done at the hardware level and is referred as RAID controllers. Obviously embedding this piece of Software at Hardware level gives better performance as it offload the OS job. Software RAIDs are available from Veritas, Sourceforge.net, Windows has builtin. Hardware RAID are from Adaptec, HP etc.,. All the NAS and SAN boxes will have RAID controllers built-in.

Some people also refer RAID as Redundant Array of Inexpensive Disks, but strictly speaking I don’t like this expansion because if I can afford million $$ also a single disk cant give what RAID can offer (Data Protection and Performance).

RAID Terminology

When someone talks about RAID they need to talk about striping, mirroring and parity. These are three basic terms used in RAID.

Striping: Striping is the nothing but splitting the data and writing the data onto multiple harddisks simultaneously. Advantages of this being, since the harddisk are mechanical devices there is a limitation on speed with which you can write data on to Harddisks. For eg a 15K Ultra SCSI HDD you can write only upto 320MB per second(Believe me its the highest available today). What if you want to write more? So if your application needs more writing speed, you will split the data and write parallely onto multiple harddisks. So with a 10 such harddisks you can achieve 3200MBps speed. This amazing piece of striping is implemented in the RAID software program.

Mirroring: As the name indicates its the mirror of the actual data, I mean its the exact copy of the data which is stored on another extra harddisk(s). Mirroring is the terminology used for writing the same data onto the two different harddisks. This gives data protection against Harddisk Hardware failures. For eg if you want to your important files then you write it to two different harddisks such that if one hard disk fails you can retrieve from other hard disk. This piece of software is also embedded in the RAID software program. It can be done two ways 1)write on to two disks simultaneously and 2) Write onto one disk and copy from that disks.

Parity: As we have seen above Mirroring need double the space of the data you want to store to give additional protection. For example if I have some 10HDDs I need another 10HDDs to give the data protection if I use RAID. Means its adding your IT expenses whenever you want to write data. So researchers came with a concept of parity. What it does is it dedicates one harddisk as parity disk and write only a single bit on parity disk which for every 10 bits written on 10 Harddisks. It uses XOR algorithm to generate this parity bit. For eg., if you have written 1,1,0,1,0,0,1,0,1,0 bits on ten harddisks then bit 1 is written on parity disk. Suppose if I loose any one bit on any one of the 10 harddisks I can get that using doing XOR on 9 actual bits and 1 parity bit. So the same data protection is offered using Parity as with Mirroring. Recovery from parity is little slower thought you have protected your data because everytime you read the data you will execute a XOR operation which is a overhead. Its upto you to decide you want Parity or Mirroring, because there is a commercial also involved. This parity program is also embedded into RAID software program.

As I told earlier you can install this RAID software program on your Server directly or put it on a microchip which does this dedicated job to reduce overload CPU cycle which can be used to boost your application performance.

What RAID can offer you?

By this time knowing the Striping, Mirroring and Parity you should have got a fair Idea of what RAID can offer to you. Using those three beautiful programs separately or in combination you can achieve best of performance and protection to suit your business needs. I used the word Business needs because your option should be based on technical/commercial and not only on technical basis.Many people in the Industry today often call this combinations as RAID levels. Some of the popular RAID levels being

  • RAID 0 - Striping Only

  • RAID 1 - Mirroring Only

  • RAID 3 - Striping with dedicated Parity

  • RAID 4 - Striping with dedicated Parity(NetApp’s baby using blocks rather than bytes)

  • RAID 5 - Striping with Rotational Parity

  • RAID 0+1 (01) - First RAID 0 then RAID 1(From Server Point of View)

  • RAID 1+0 (10) - First RAID 1 then RAID 0(From Server point of View)

  • RAID DP(Diagonal Parity) - RAID 4 with additional parity which is calculated based on diagonal bits.

Now a days people tend to use to use more combinations(levels) of RAID like 5+0, 5+1, 1+0+1 for some of the benefits this levels can give. But the popular ones are being 0, 1,3,4,10,01,DP(with recent NetApp 7g). Lets discuss what each of these popular RAID levels has got to offer you.

RAID 0:RAID 0 is striping only. Its nothing but splitting the data and writing onto multiple harddisks simultaneously to get better performance. There is not protection for your data if one disk fails you will loose your data.


I’m not gonna write full advantages and disadvantages, I want you to think. But I’ll write the basic and the important adv/dis-adv. Also think what kind of applications each RAID can be used for.


Advantages : Better Performance(Read/Write) in terms of both Read and Write, 100% Disk Utilization.


Disadvantages: When it comes to Data Protection its as useless as using single hard disk. One disk fails your gone, you will have a nightmare recovering your data from tapes.


RAID 1:
RAID 1 is mirroring only. Means when you write your data you write the same date to two set of disks so that if one disk fails you can start using other disks.


Advantages: Best Data protection. If you loose one disk, you have the data on other disk. No need to regenerate the actual data from parity in case of disk failure, so its the fastest recovery.

Disadvantages: 100% overhead. No performance gains.

RAID 3:RAID 3 is Striping with Parity. Did you realize when ever I talked about striping I said splitting the data and writing on the multiple harddisk but did I mention the unit for striping. I mean how many parts the data will be split into and what size each stripe will be. The possible units can be bits, bytes, blocks. RAID 3 is striping the data based on bytes.Means whenever a data write request comes it divides it splits them into small chunks of size 1 byte and write this chunks into multiple harddisks parallely. While doing the same it also executes the XOR operation for all those parallely written chunks and generate the parity byte and write that to the dedicated parity disks. Confusing….Read again. Its nothing but Striping with byte as unit and writing the parity onto a dedicated disks.

Advantages: Better performance(read/write), Data Protection, Better utilization of Harddisks(1 disk is wasted for each RAID Group, RAID Group is nothing but set of disks on which RAID is applied)

Disadvantages: Rebuilding time incase of disk fails, though you will be able to access your data performance is impacted.

RAID 4: This is similar to RAID 3 but with the stripe unit as Block not Byte. This has got its own advantages like performance improvement compared to RAID 3 in term of both read and write. This has become popular because NetApp uses RAID 4 in their earlier model boxes( still in use).


Same image as above.
RAID 5: Unlike RAID 4 where it uses a dedicated parity disk, RAID 5 is Striping(block level) with Rotational Parity. There is no dedicated parity disk and each time a new chunk of data is being written it chooses one of the data disks to write the parity bit. By doing this it reduces the rebuild time of the actual data and also it increases the integrity of the data.

Advantages: Better performance(high read/medium write), Data protection with better integrity, Better utilization of disk, faster rebuild of actual data incase of disk fail.

Disadvantages: If implemented as software, it chokes up your CPU & Memory utilization, During rebuild of actual data incase of disk failure may result in slow performance though your data is accessible.


RAID 01:
Here comes the interesting & confusing part, till now you have seen a combinations of Striping, mirroring and parity now its the time for combinations of above RAID levels. To make it simple use this when using multiple raid levels(RAID XY), split the harddisks into sets then do RAID X on disks in each set and then do a RAID Y for those sets. For eg you have 10 disks and want to do RAID 01 means first RAID o and then RAID 1, so split this disks into two sets of 5, now do a RAID 0 on each of these 5 disks in both the sets, then do a RAID 1 on two sets as if it was only two harddisks(remember the basics, once raid is implemented it will simulate single disk).

Advantages: Higher Data protection, better performance even incase of disk failure

Disavdantages: 50% utilization of the disks and one disk failure may result in whole mirror rebuild for providing data protection again.

RAID 10: Again apply the same formula of RAID XY, in this case we will do RAID 1 first and RAID 0 then so we need to divide the disks into set of two each and do a RAID 1 and then do a RAID 0 on all these sets.

 

Advantages: Higher Data protection, better performance even incase of disk failure

Disavdantages: 50% utilization of the disks.

As I mentioned earlier I don’t want to write a detailed advantages and dis-advantages, I want you to think what are the advantages and why. What I mentioned in advantages are very few compared to full list.

Now you know the basics of RAID why don’t you give a thought where you can apply and most importantly why that’s the best you think for your business. Believe me there are lot of application vendors(ISVs) who tells which RAID to be used but please understand why they are saying so, do you think giving a general recommendation without knowing your business needs(technical requirements and budget available) is viable. If I’ve a million $$ lying with me and want to host DB server, I go by technical recommedations I go for RAID 10 but what if I don’t have enough $$ to afford RAID 10. So justifying the investment is important. Best technical recommendations are given by people who are passionate about technology but they don’t think is it worth investment or can you afford. So you need to trade off between best technical and commercial solutions and come to a conclusion which RAID is right for your Business.

Cheers,
Chundi

2 Comments so far

  1. rajkamal June 7th, 2007 4:06 am

    Explained very well without using typical jargons. For amateurs it helps a lot to get an idea of RAID.

    Well done.

  2. Sudhir February 22nd, 2008 2:30 am

    Well explained Chundi…Gave a good idea on RAID in a simple manner.

    Thanks
    Sudhir

Leave a reply