Identifying and Replacing Duplicate Items across rows using SQL Server 2005/2008

A client of mine recently showed me a table that contained a users Registration and Zone information for 3 zones. There was a pattern in the data.

The sample data has been shown below:

-- Create Sample Table

DECLARE @TT table

( 

ID int, 

RegID int,

ZoneA varchar(5),

ZoneB varchar(5),

ZoneC varchar(5)

)

-- Create Sample Data

INSERT INTO @TT VALUES ( 1, 1,   'A0001','B0001','C0001');

INSERT INTO @TT VALUES ( 2, 2,   'A0002','B0002','C0002');

INSERT INTO @TT VALUES ( 3, 3,   'A0003','B0003','C0003');

INSERT INTO @TT VALUES ( 4, 4,   'A0001','B0001','C0001');

INSERT INTO @TT VALUES ( 5, 5,   'A0001','B0001','C0001');

INSERT INTO @TT VALUES ( 6, 6,   'A0006','B0006','C0006');

INSERT INTO @TT VALUES ( 7, 7,   'A0007','B0007','C0007');

INSERT INTO @TT VALUES ( 8, 8,   'A0002','B0002','C0002');

INSERT INTO @TT VALUES ( 9, 9,   'A0009','B0009','C0009');

INSERT INTO @TT VALUES ( 10, 10, 'A0010','B0010','C0010');

INSERT INTO @TT VALUES ( 11, 11, 'A0011','B0011','C0011');

INSERT INTO @TT VALUES ( 12, 12, 'A0001','B0001','C0001');

INSERT INTO @TT VALUES ( 13, 13, 'A0013','B0013','C0013');

Users with Registration ID 1, 2 and 3 had Zone data similar to the following:

1    1    A0001    B0001    C0001

2    2    A0002    B0002    C0002

3    3    A0003    B0003    C0003

So going by the pattern, a user with Registration ID 4 should have Zone data

4    4    A0004    B0004    C0004

However that was not the case, as seen in the sample data. There were a few other rows where data of a different registration id had been entered. The requirement however was not to link the data with the pattern, but to find duplicates of existing rows (zones) and mark them with a special character.

Here's how I identified the duplicate rows using CTE(Common Table Expression) and marked them with a '---'

Query

;WITH Dups

AS

(

SELECT *, ROW_NUMBER() OVER (PARTITION BY ZoneA,ZoneB,ZoneC ORDER BY ID) as ZoneData

        FROM @TT

)

SELECT ID, RegID,

        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneA END AS ZoneA,

        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneB END AS ZoneB,

        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneC END AS ZoneC    

FROM Dups

ORDER BY ID

Output

ID    RegID    ZoneA    ZoneB    ZoneC

1    1    A0001    B0001    C0001

2    2    A0002    B0002    C0002

3    3    A0003    B0003    C0003

4    4    --    --    --

5    5    --    --    --

6    6    A0006    B0006    C0006

7    7    A0007    B0007    C0007

8    8    --    --    --

9    9    A0009    B0009    C0009

10    10    A0010    B0010    C0010

11    11    A0011    B0011    C0011

12    12    --    --    --

13    13    A0013    B0013    C0013

About The Author

Suprotim Agarwal, MCSD, MCAD, MCDBA, MCSE, is the founder of DotNetCurry, DNC Magazine for Developers, SQLServerCurry and DevCurry. He has also authored a couple of books 51 Recipes using jQuery with ASP.NET Controls and a new one recently at The Absolutely Awesome jQuery CookBook.

Suprotim has received the prestigous Microsoft MVP award for nine times in a row now. In a professional capacity, he is the CEO of A2Z Knowledge Visuals Pvt Ltd, a digital group that represents premium web sites and digital publications comprising of Professional web, windows, mobile and cloud developers, technical managers, and architects.

Get in touch with him on Twitter @suprotimagarwal, LinkedIn or befriend him on Facebook

1 comment:

MSSM said...: I think that here is a simple "classic" solution which will work on SS2000 also:

update @TT
set
zoneA='--'
,zoneB='--'
,zoneC='--'
FROM @TT t2
where regID>(select min(regID) from @TT t where t.zoneA=t2.zoneA and t.zoneB=t2.zoneB and t.zoneC=t2.zoneC)

You can find a great number of tasks at SQL Exercises; April 27, 2009 at 3:38 AM