Importance of RAID in Databases

This article was modified on August 31st, 2016.

RAID is a disk system that contains multiple disk drives, called an array, to provide greater performance, fault tolerance, storage capacity, at a moderate cost. While configuring your server system, you typically have to make a choice between hardware RAID and software RAID for the server’s internal disk drives

RAID System

RAID systems are widely used as storage solutions to get the best I/O performance, depending whether the application is write intensive or read intensive. DBA RayRankins mentions in his book that for database-related applications, in order to minimize disk head movement and maximize I/O performance, it’s a good practice to spread random I/O’s (data changes) and sequential I/O’s (for the transaction log) across different disk subsystems. I agree and endorse his view, as SQL Server, or for that matter any other database, is very much an I/O intensive system.

I will be using SQL Server as an example to explain the importance of RAID in databases, however you can implement these concepts in your choice of database. The concepts more or less, remain the same

Although RAID is not a part of a database like SQL Server, implementing RAID can directly affect the way SQL Server performs.There are many RAID arrays available such as RAID 0, RAID 1, RAID 3, RAID 4, RAID 5, RAID 6, RAID 10 and RAID 01. In this article, we will discuss the ones you will likely encounter as a SQL Server DBA i.e. RAID levels 0, 1, 5 and 10 and also  discuss their advantages and disadvantages from a fault tolerance and performance perspective

Note: RAID is not a replacement for backups. Backups are very essential for any system.

Different RAID Levels (Advantages and Disadvantages)


We will discussing only RAID 0, 1, 5 and 10 (database perspective) .

RAID 0 – Also known as Disk Striping, RAID 0 does not provide redundancy or fault tolerance but instead writes data to two drives, in an alternating fashion. This provides the best read write I/O performance. If you had 8 chunks of data, for example, chunk 1, 3, 5, and 7 would be written to the first drive, and chunk 2, 4, 6, and 8 would be written to the second drive, but all in a fixed (sequential) order. RAID 0 has a simple design, easier to implement and no overheads for parity. The drawback is that any piece of data is on only one disk, so if one disk fails, data stored within those disks are lost.

RAID 1- Also known as Disk Mirroring, RAID 1 provides a redundant, identical copy of a selected disk and thus provides good fault tolerance. It can be implemented with 2 drives. The disadvantage is that it has a big storage overhead and a high cost/capacity ratio

RAID 5 - Also known as Disk Striping with Parity, stripes data across multiple drives and writes parity bits across drives. Data redundancy is provided by the parity information. It can be implemented with 3 or more disks and is a popular choice amongst DBA’s. Since data and parity information are arranged on the disk array, two types of information are always on different disks. If one disk fails, just replace it with a new disk and the array rebuilds itself. RAID 5 has a higher read rate and makes good use of capacity. The drawback of RAID 5 is slower write rates and slow rebuild times.

RAID 10 - Also known as mirroring with striping, RAID 10 is a combination of RAID1 + RAID0. RAID 10 uses a striped array of disks that are then mirrored to another identical set of striped disks. This array level uses at least four hard disks and additional disks must be added in even numbers. The data is first placed into mirrored pairs at the lower level. Next, the controller selects a member from each mirrored pair and stripes the data into a new logical volume. Since RAID 10 writes in a random fashion, it provides best performance with a write-intensive application (like video editing). The drawback is that it is expensive.

Which RAID is Suitable for my Database?


Now that you have an overview of RAID levels, let’s look at which RAID is suitable for a database. The answer to this question depends on a variety of factors. Do you want availability, performance or cost? What are your requirements for fault tolerance and performance? Here’s a quick snapshot of the performance and fault tolerance provided by RAID

sqlserver-raid-performance

When it comes to a database like SQL Server, no one RAID level will suit your need. In most cases, SQL Server performs large reads and small writes. So for databases, where write operations are more, RAID 5 is not a good choice. On the contrary, RAID 10 is a good option for databases with more write operations.

Here are some points and best practices to keep in mind while deciding the RAID system for your database.
  • Your Data, Logs, tempdb, and backups should be on separate physical drives or a set of disks (array).
  • RAID1 is often chosen to store operating systems, binaries, index file groups and database transaction log files. Critical to log and index performance is fault tolerance and a good write speed.
  • Since log file are written sequentially and are read only for recovery operations, recommended RAID for Log Files is RAID 1 or 10. If your RAID 1 is at a 100% usage, choose RAID 10 for better performance.
  • For data files with random access and read heavy data volumes, striping is important. So recommended RAID is 5 or 10.
  • For data files which require good write performance, RAID 10 is recommended. Use a battery backed caching RAID controller for better write performance
  • For tempdb files with good read/write performance RAID 0, 1 or 10 is recommended. Although tempdb contains temporary data and DBA’s often go in for RAID 0 for tempdb, keep in mind that SQL Server requires tempdb to carry out many of its activities. So if you want your system to be always available, rethink RAID 0.
  • Putting log files or tempdb on a RAID 5 array is not recommended, since RAID 5 does not perform well for write operations. DBA's although have conflicting opinions over this point.
  • Choose small fast drives, over large slow drives.
Make sure you do your own research, so you can make an informed decision! In a real scenario, you may not have the luxury of deciding your server configurations due to the cost associated with it. So keeping these points in mind, may help you in such scenarios.

Please use the comments section and share your opinions!


About The Author

Suprotim Agarwal, MCSD, MCAD, MCDBA, MCSE, is the founder of DotNetCurry, DNC Magazine for Developers, SQLServerCurry and DevCurry. He has also authored a couple of books 51 Recipes using jQuery with ASP.NET Controls and a new one recently at The Absolutely Awesome jQuery CookBook.

Suprotim has received the prestigous Microsoft MVP award for nine times in a row now. In a professional capacity, he is the CEO of A2Z Knowledge Visuals Pvt Ltd, a digital group that represents premium web sites and digital publications comprising of Professional web, windows, mobile and cloud developers, technical managers, and architects.

Get in touch with him on Twitter @suprotimagarwal, LinkedIn or befriend him on Facebook

5 comments:

Anonymous said...

Greate post. I was wondering this topic a few day before. Could you possibly go deeper on this topic??

Suprotim Agarwal said...

OK! I will see if I get time this week to author a deep dive on RAID. Thanks for the feedback.

Unknown said...

Thanks Mr Agarwal ...
It was best explaination I could search
Good work and keep going .
Good wishes

Unknown said...

Great Explanation! Thank you very much

Suprotim Agarwal said...

You are welcome Abhishek and Neal :)