October 17, 2010

Find and Delete Duplicate Rows From Tables without Primary Key




Someone recently asked me how to find and delete duplicate rows from sample tables, which do not have a Primary Key. I have written one such sample on MSDN code Find and/or Delete Duplicate Rows which I will share here

SAMPLE Data

-- Suppress data loading messages
SET NOCOUNT ON

-- Create Table
CREATE TABLE #Customers (ID integer, CustName varchar(20), Pincode int)

-- Load Sample Data in Table
INSERT INTO #Customers VALUES (1, 'Jack',45454 )
INSERT INTO #Customers VALUES (2, 'Jill', 43453)
INSERT INTO #Customers VALUES (3, 'Tom', 43453)
INSERT INTO #Customers VALUES (4, 'Kathy', 22343)
INSERT INTO #Customers VALUES (5, 'David', 65443)
INSERT INTO #Customers VALUES (6, 'Kathy', 22343)
INSERT INTO #Customers VALUES (7, 'Kim', 65443)
INSERT INTO #Customers VALUES (8, 'Hoggart', 33443)
INSERT INTO #Customers VALUES (9, 'Kate', 61143)
INSERT INTO #Customers VALUES (10, 'Kim', 65443)

To indentify duplicate rows, the trick is to Group the rows by CustName, Pincode. Rows having similar CustName and Pincode will have more than one rows in the grouping. So locate them using HAVING COUNT(*) > 1. If duplicate values are encountered, return the maximum ID for each duplicate row. Using the outer query, delete any ID returned by subquery.

-- Find Duplicate Rows
SELECT MAX(ID) as ID, CustName, Pincode FROM #Customers
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1

-- Delete Duplicate Rows
DELETE FROM #Customers
WHERE ID IN
( SELECT MAX(ID) FROM #Customers
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1)
SELECT * FROM #Customers

OUTPUT

image 


Did you like this post?
kick it on DotNetKicks.com
subscribe via rss subscribe via e-mail
print this post follow me on twitter


About The Author

Suprotim Agarwal, ASP.NET Architecture MVP works as an Architect Consultant and provides consultancy on how to design and develop Web applications.

Suprotim is also the founder and primary contributor to DevCurry, DotNetCurry and SQLServerCurry. He has also written an EBook 51 Recipes using jQuery with ASP.NET Controls.

Follow him on twitter @suprotimagarwal

 
  Feedback:

comments

4 Responses to "Find and Delete Duplicate Rows From Tables without Primary Key"
  1. Robert Seso said...
    October 18, 2010 at 6:05 AM

    I'd change ...WHERE ID IN (...) to ...WHERE ID NOT IN (...), that way you'd get rid of all the duplicates, regardless how many there are.

  2. Federico Emi said...
    October 18, 2010 at 9:12 AM

    Robert adding NOT IN will delete the rows that are 'not duplicate' and that's exactly the opposite of what the author describes.

    If you are suggesting an alternate query, please elaborate since I get the desired results using the query shown in this article

  3. Anonymous said...
    October 18, 2010 at 5:09 PM

    This won't work if there is more than one duplicate as it only deletes the duplicate with the largest ID field.

  4. darlove said...
    November 4, 2010 at 9:48 AM

    To satisfy the requirement that all dups are deleted, no matter how many there are, use the row_number() window function. Number the rows within the same group according to your id field in an ascending order, and then delete all rows that have the generated row_number() > 1. This way you'll be left with no dups and it's all in one go.

 

Copyright © 2009-2014 All Rights Reserved for SQLServerCurry.com by Suprotim Agarwal | Terms and Conditions