SQL Server 2008 introduces Sparse Columns

SQL Server 2008 introduces a new concept of sparse columns with a table, for providing no physical usage for a column which may have NULL data.

Let’s see what benefit this feature offers us with respect to the memory required.

If there is a single table with thousands of rows and we have a single filed with some sort of flag which might have value only for some 100 records, it is quite memory consuming, even if we take bit as the data type. With sparse columns there can be multiple columns with a single table with no physical memory taken if they are declared as sparse.

To explain this further, we will take an example where Sparse Column can be implemented. Let us take a case of Registration Data for a website. While many people may register, not many would like to subscribe to some newsletter we may want to send. For those who would like to receive such newsletter, we keep a flag. Since the population of such people will be less, the flag may not hold any value for majority of the records but is still going to consume space in each record.

Let’s see some code snippets how this feature can be accomplished. This is with reference to Feb CTP SQL 2008 (CTP 6)

Sparse Columns can be created with CREATE TABLE or ALTER TABLE statements. It is like an ordinary column with SPARSE as the keyword. Sparse column can not be associated with some data types like text, NText or image. It also can not have properties like ROWGUID, IDENTITY or FILESTREAM.


Following is the code snippet to create a table with sparse columns in it

CREATE TABLE SparseColumns

(ID int IDENTITY PRIMARY KEY,

Col1 int SPARSE,

Col2 int SPARSE,

Col3 int SPARSE,

Col4 int SPARSE,

Col5 int SPARSE,

Col6 int SPARSE,

Col7 int SPARSE,

Col8 int SPARSE,

Col9 int SPARSE,

Col10 int SPARSE)

INSERT INTO Sparsecolumns

(col1, col4)

VALUES (1,4)

SELECT * FROM SparseColumns will give a result with all columns displayed.


If there are a lot of sparse columns with a single table, the output can be really messy. In order to avoid it you can use Columns Sets associated with Sparse Columns. A column set is an untyped XML representation which combines all sparse columns together to form a better output. Once a column set added, it can not be changed. The table needs to be dropped and re- created. After adding column set, the output of select query will not return individual columns but will return XML representation.

A column set can be added to a table with CREATE TABLE statement and can not be done with ALTER TABLE


CREATE TABLE SparseColumns

(ID int identity primary key,

SparseColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS,

Col1 int Sparse,

Col2 int Sparse)


Summary

Using sparse columns is a good way to reduce physical storage. This feature will be useful in following scenarios

  • Storage space is a critical criteria (with sparse columns the disk space can be saved from 20 to 40 %)
  • If there are a lot of columns in a table which are likely to have NULL values (more than 50%)


No comments: