Identifying and Replacing Duplicate Items across rows using SQL Server 2005/2008

A client of mine recently showed me a table that contained a users Registration and Zone information for 3 zones. There was a pattern in the data.

The sample data has been shown below:


-- Create Sample Table


DECLARE @TT table


(


ID int,


RegID int,


ZoneA varchar(5),


ZoneB varchar(5),


ZoneC varchar(5)


)


 


-- Create Sample Data


INSERT INTO @TT VALUES ( 1, 1,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 2, 2,   'A0002','B0002','C0002');


INSERT INTO @TT VALUES ( 3, 3,   'A0003','B0003','C0003');


INSERT INTO @TT VALUES ( 4, 4,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 5, 5,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 6, 6,   'A0006','B0006','C0006');


INSERT INTO @TT VALUES ( 7, 7,   'A0007','B0007','C0007');


INSERT INTO @TT VALUES ( 8, 8,   'A0002','B0002','C0002');


INSERT INTO @TT VALUES ( 9, 9,   'A0009','B0009','C0009');


INSERT INTO @TT VALUES ( 10, 10, 'A0010','B0010','C0010');


INSERT INTO @TT VALUES ( 11, 11, 'A0011','B0011','C0011');


INSERT INTO @TT VALUES ( 12, 12, 'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 13, 13, 'A0013','B0013','C0013');




Users with Registration ID 1, 2 and 3 had Zone data similar to the following:


1    1    A0001    B0001    C0001


2    2    A0002    B0002    C0002


3    3    A0003    B0003    C0003




So going by the pattern, a user with Registration ID 4 should have Zone data


4    4    A0004    B0004    C0004




However that was not the case, as seen in the sample data. There were a few other rows where data of a different registration id had been entered. The requirement however was not to link the data with the pattern, but to find duplicates of existing rows (zones) and mark them with a special character.

Here's how I identified the duplicate rows using CTE(Common Table Expression) and marked them with a '---'

Query


;WITH Dups


AS


(


SELECT *, ROW_NUMBER() OVER (PARTITION BY ZoneA,ZoneB,ZoneC ORDER BY ID) as ZoneData


        FROM @TT


)


SELECT ID, RegID,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneA END AS ZoneA,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneB END AS ZoneB,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneC END AS ZoneC   


FROM Dups


ORDER BY ID




Output


ID    RegID    ZoneA    ZoneB    ZoneC


1    1    A0001    B0001    C0001


2    2    A0002    B0002    C0002


3    3    A0003    B0003    C0003


4    4    --    --    --


5    5    --    --    --


6    6    A0006    B0006    C0006


7    7    A0007    B0007    C0007


8    8    --    --    --


9    9    A0009    B0009    C0009


10    10    A0010    B0010    C0010


11    11    A0011    B0011    C0011


12    12    --    --    --


13    13    A0013    B0013    C0013



About The Author

Suprotim Agarwal, MCSD, MCAD, MCDBA, MCSE, is the founder of DotNetCurry, DNC Magazine for Developers, SQLServerCurry and DevCurry. He has also authored a couple of books 51 Recipes using jQuery with ASP.NET Controls and a new one recently at The Absolutely Awesome jQuery CookBook.

Suprotim has received the prestigous Microsoft MVP award for nine times in a row now. In a professional capacity, he is the CEO of A2Z Knowledge Visuals Pvt Ltd, a digital group that represents premium web sites and digital publications comprising of Professional web, windows, mobile and cloud developers, technical managers, and architects.

Get in touch with him on Twitter @suprotimagarwal, LinkedIn or befriend him on Facebook

1 comment:

MSSM said...

I think that here is a simple "classic" solution which will work on SS2000 also:

update @TT
set
zoneA='--'
,zoneB='--'
,zoneC='--'
FROM @TT t2
where regID>(select min(regID) from @TT t where t.zoneA=t2.zoneA and t.zoneB=t2.zoneB and t.zoneC=t2.zoneC)

You can find a great number of tasks at SQL Exercises