Fastest Way to Update Rows in a Large Table in SQL Server

Many a times, you come across a requirement to update a large table in SQL Server that has millions of rows (say more than 5 millions) in it. In this article I will demonstrate a fast way to update rows in a large table

Consider a table called test which has more than 5 millions rows. Suppose you want to update a column with the value 0, if it that column contains negative value. Let us also assume that there are over 2 million row in that column that has a negative value.

The usual way to write the update method is as shown below:

UPDATE test
SET col=0
WHERE col<0

The issue with this query is that it will take a lot of time as it affects 2 million rows and also locks the table during the update.

You can improve the performance of an update operation by updating the table in smaller groups. Consider the following code:

Update Large Table

The above code updates 10000 rows at a time and the loop continues till @@rowcount has a value greater than zero. This ensures that the table is not locked.

Best practices while updating large tables in SQL Server

1. Always use a WHERE clause to limit the data that is to be updated

2. If the table has too many indices, it is better to disable them during update and enable it again after update

3. Instead of updating the table in single shot, break it into groups as shown in the above example.

You may also want to read my article Find the Most Time Consuming Code in your SQL Server Database


About The Author

Madhivanan,an MSc computer Science graduate from Chennai-India, works as a works as a Lead Subject Matter Expert at a company that simplifies BIG data. He started his career as a developer working with Visual Basic 6.0, SQL Server 2000 and Crystal Report 8. As years went by, he started working more on writing queries in SQL Server. He now has good level of knowledge in SQLServer, Oracle, MySQL and PostgreSQL as well. He is also one of the leading posters at www.sqlteam.com and a moderator at www.sql-server-performance.com. His T-sql blog is at http://beyondrelational.com/blogs/madhivanan

10 comments:

illearth said...

According to the documentation using SET ROWCOUNT is deprecated in future versions and it's suggested you use TOP. So this could be re-written as:

update top(10000) test set col=0 where col<0;

while @@rowcount > 0
begin
update top(10000) test set col=0 where col<0;
end

Anonymous said...

Why are the SQL team removing SET ROWCOUNT? We have so many queries that use the ROWCOUNT. does that mean if sql server 2010 or 2011 whatever comes out we change all the queries?

thank you for the link illearth.

Anonymous said...

Wow, I guess I'll rant a bit because I keep visiting these supposed SQL guru blogs to find out that they contain something that is elementary or not even usable...

2 million rows production table update-->

Declare @i int

Select @i = min(id) from tableToUpdate Where ?????

While IsNull(@i,0)>0
Begin
Update tableToUpdate set col = 'something' where id = @i

Select @i = id from tableToUpdate where id > @id

if @i%10000 print 'Updated 10k rows'

End

I know this may not be the fastest way, but it is reliable and safe. I have used it in production CRM systems with tables that contain 50 million rows, and I have gotten not calls from users re: unforseen performance issues in the CRM system (table locks, excessive I/O, etc.)

Unknown said...

Your rant is more or less the same what the author described. Infact I found the original post to be easier to read ;)

illearth that's real good tip about using TOP instead of ROWCOUNT

M. Aamir Maniar said...

Good one, Madhivanan...

-
www.technobits.net

Michael Brönnimann said...

LARGE VOLUME DATA MODIFICATIONS

Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

Therefore you may have a look into the approach of SQL Parallel Boost at
http://www.ibax.ch/-IBX-/plain.simpleimagetitletextlinklist.en.solutions.products.parallelboost/default.aspx

This approach can also be used to execute multiple SQL statements in parallel.

A purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The overall performance gain thru parallelisation with SQL Parallel Boost is up to 10 !

In case you don't wan't to rebuild your own solution, SQL Parallel Boost provides a self-contained pure T-SQL based solution, which can be easily embedded in existing applications and ETL process tasks.

Anonymous said...

Will this same method work when using a DELETE statement or INSERT intead of UPDATE?

Michael Brönnimann said...

SQL Parallel Boost supports UPDATE, DELETE and INSERT alike.

Michael Brönnimann said...

A free Community Edition of SQL Parallel Boost can be downloaded at SQL Parallel Boost / CodePlex

caminomaster said...

I need to update a column t1.PRODUCT with new values, from t2.OLD to t2.NEW...

there are cases like:
OLD NEW
21 35
35 44

I need to know, if there's no risk for that script to turn 21 into 35, and later 35 into 44, overwriting it's previous update.