SQL Server 2008 -Change Data Capture (CDC)

Exploring Change Data Capture (CDC) in SQL Server 2008

SQL Server 2008 has introduced ability to track changes to a table with CDC (Change Data Capture). With this, the changes to a table can be logged in another table without writing any trigger or any such mechanism. The changes to a table like insert, update and delete can be logged.

Advantages of using CDC

With SQL 2008 and 2005 the jobs could be done by writing update or after delete triggers. With CDC the logging can be done without writing triggers. Once a source table has been enabled for CDC immediately after the data is written to a transaction log (typical of any database) the capture process reads this log and puts the details in the log. This will help ETL process to query the change by using CDC functions. The change metadata is also written along with the actual change.

Let’s take an example

Following are some code snippets how CDC can be implemented. This code is with respect to Feb CTP of SQL Server 2008 (CTP 6) . Let’s see a small example by creating a database and enabling CDC for a table in it. Before CDC is enabled for a table it needs to be enabled for database.

CREATE DATABASE MyCDCDatabase
GO

USE MyCDCDatabase
GO

EXECUTE sys.sp_cdc_enable_db;
GO

To cross check if CDC is enables query system database
SELECT [name], is_cdc_enabled from sys.databases

The result will show value 1 for the newly created database.



After enabling CDC there are 2 jobs done. A user with the name of cdc is created for the database and a schema with the name cdc is added. If there is already use with cdc enabling CDC can not be done.

CREATE TABLE MyTableCDC(
ID int NOT NULL PRIMARY KEY,
Name nvarchar(20) NOT NULL)
GO

Ensure that SQL Server agent is running. If specified role doesn’t exist it automatically gets created. Running...

EXEC sys.sp_cdc_enable_table
@source_schema=N'dbo',
@Source_name=N'MyTableCDC',
@role_name=N'cdc_admin',
@capture_instance=N'dbo_MyTableCDC',
@supports_net_changes=1;
gives following message :

Job 'cdc.myCDCDatabase_capture' started successfully.

Job 'cdc.myCDCDatabase_cleanup' started successfully.

A list of tables get created in the database which will track all changes. It can be verified by giving the query :

SELECT name, is_tracked_by_cdc from sys.tables

INSERT INTO MyTableCDC VALUES (1,'Gouri')
INSERT INTO MyTableCDC VALUES (2,'Smita')
INSERT INTO MyTableCDC VALUES (3,'Subodh')
INSERT INTO MyTableCDC VALUES (4,'Sarita')

SELECT * fROM dbo.MyTableCDC
SELECT * FROM cdc.dbo_MyTableCDC_CT

After deleting a row from MyTableCDC, the table will be left with 3 rows but the captured table will now have 5 rows,

DELETE from dbo.MyTableCDC where Id=4

SELECT * fROM dbo.MyTableCDC
SELECT * FROM cdc.dbo_MyTableCDC_CT




Now let’s try update for the table :

UPDATE dbo.MyTableCDC SET Name='Gouri S' Where ID=1

SELECT * fROM dbo.MyTableCDC
SELECT * FROM cdc.dbo_MyTableCDC_CT


The result now shows 7 rows in the tracked table. First 4 rows from Insert, 1 from delete and update has 2 rows. One previous to update and one after update.

The functions available to capture the changed data are as follows. The second function gets created if the net changes are set to 1.

DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_MyTableCDC');
SET @to_lsn = sys.fn_cdc_get_max_lsn();
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_MyTableCDC(@from_lsn, @to_lsn, 'all')
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyTableCDC(@from_lsn,@to_lsn,'all')



Get all changes returns all the changes within a range and get net changes returns only the final changes within row.

Conclusion:
CDC can be used with ETL operation for pulling data. Be careful with enabling, as depending upon the net changes, a lot many number of records will be added to the tracked table. Depending upon the tables tracked, the number of tracked tables will also be increased.



2 comments:

Anonymous said...

How do we use CDC for related tables such as a master and detail where master having the PK and some basic details such as created by created on etc and the details having the entity specific attributes?
How do I use the CDC data for a single record modification by joining the master and details CDC tables?
Please help.

Anonymous said...

Hi,

Can you tell how to access functions like "sys.fn_cdc_get_min_lsn" by using a linked server connection.

Thanks.