RAID is a disk system that contains multiple disk drives, called an array, to provide greater performance, fault tolerance, storage capacity, at a moderate cost. While configuring your server system, you typically have to make a choice between hardware RAID and software RAID for the server’s internal disk drives
RAID systems are widely used as storage solutions to get the best I/O performance, depending whether the application is write intensive or read intensive. DBA RayRankins mentions in his book that for database-related applications, in order to minimize disk head movement and maximize I/O performance, it’s a good practice to spread random I/O’s (data changes) and sequential I/O’s (for the transaction log) across different disk subsystems. I agree and endorse his view, as SQL Server, or for that matter any other database, is very much an I/O intensive system.
I will be using SQL Server as an example to explain the importance of RAID in databases, however you can implement these concepts in your choice of database. The concepts more or less, remain the same
Although RAID is not a part of a database like SQL Server, implementing RAID can directly affect the way SQL Server performs.There are many RAID arrays available such as RAID 0, RAID 1, RAID 3, RAID 4, RAID 5, RAID 6, RAID 10 and RAID 01. In this article, we will discuss the ones you will likely encounter as a SQL Server DBA i.e. RAID levels 0, 1, 5 and 10 and also discuss their advantages and disadvantages from a fault tolerance and performance perspective
Note: RAID is not a replacement for backups. Backups are very essential for any system.
Different RAID Levels (Advantages and Disadvantages)
We will discussing only RAID 0, 1, 5 and 10 (database perspective) .
RAID 0 – Also known as Disk Striping, RAID 0 does not provide redundancy or fault tolerance but instead writes data to two drives, in an alternating fashion. This provides the best read write I/O performance. If you had 8 chunks of data, for example, chunk 1, 3, 5, and 7 would be written to the first drive, and chunk 2, 4, 6, and 8 would be written to the second drive, but all in a fixed (sequential) order. RAID 0 has a simple design, easier to implement and no overheads for parity. The drawback is that any piece of data is on only one disk, so if one disk fails, data stored within those disks are lost.
RAID 1- Also known as Disk Mirroring, RAID 1 provides a redundant, identical copy of a selected disk and thus provides good fault tolerance. It can be implemented with 2 drives. The disadvantage is that it has a big storage overhead and a high cost/capacity ratio
RAID 5 - Also known as Disk Striping with Parity, stripes data across multiple drives and writes parity bits across drives. Data redundancy is provided by the parity information. It can be implemented with 3 or more disks and is a popular choice amongst DBA’s. Since data and parity information are arranged on the disk array, two types of information are always on different disks. If one disk fails, just replace it with a new disk and the array rebuilds itself. RAID 5 has a higher read rate and makes good use of capacity. The drawback of RAID 5 is slower write rates and slow rebuild times.
RAID 10 - Also known as mirroring with striping, RAID 10 is a combination of RAID1 + RAID0. RAID 10 uses a striped array of disks that are then mirrored to another identical set of striped disks. This array level uses at least four hard disks and additional disks must be added in even numbers. The data is first placed into mirrored pairs at the lower level. Next, the controller selects a member from each mirrored pair and stripes the data into a new logical volume. Since RAID 10 writes in a random fashion, it provides best performance with a write-intensive application (like video editing). The drawback is that it is expensive.
Which RAID is Suitable for my Database?
Now that you have an overview of RAID levels, let’s look at which RAID is suitable for a database. The answer to this question depends on a variety of factors. Do you want availability, performance or cost? What are your requirements for fault tolerance and performance? Here’s a quick snapshot of the performance and fault tolerance provided by RAID
When it comes to a database like SQL Server, no one RAID level will suit your need. In most cases, SQL Server performs large reads and small writes. So for databases, where write operations are more, RAID 5 is not a good choice. On the contrary, RAID 10 is a good option for databases with more write operations.
Here are some points and best practices to keep in mind while deciding the RAID system for your database.
- Your Data, Logs, tempdb, and backups should be on separate physical drives or a set of disks (array).
- RAID1 is often chosen to store operating systems, binaries, index file groups and database transaction log files. Critical to log and index performance is fault tolerance and a good write speed.
- Since log file are written sequentially and are read only for recovery operations, recommended RAID for Log Files is RAID 1 or 10. If your RAID 1 is at a 100% usage, choose RAID 10 for better performance.
- For data files with random access and read heavy data volumes, striping is important. So recommended RAID is 5 or 10.
- For data files which require good write performance, RAID 10 is recommended. Use a battery backed caching RAID controller for better write performance
- For tempdb files with good read/write performance RAID 0, 1 or 10 is recommended. Although tempdb contains temporary data and DBA’s often go in for RAID 0 for tempdb, keep in mind that SQL Server requires tempdb to carry out many of its activities. So if you want your system to be always available, rethink RAID 0.
- Putting log files or tempdb on a RAID 5 array is not recommended, since RAID 5 does not perform well for write operations. DBA's although have conflicting opinions over this point.
- Choose small fast drives, over large slow drives.
Make sure you do your own research, so you can make an informed decision! In a real scenario, you may not have the luxury of deciding your server configurations due to the cost associated with it. So keeping these points in mind, may help you in such scenarios.
Please use the comments section and share your opinions!