April 17, 2009

Identifying and Replacing Duplicate Items across rows using SQL Server 2005/2008




A client of mine recently showed me a table that contained a users Registration and Zone information for 3 zones. There was a pattern in the data.

The sample data has been shown below:


-- Create Sample Table


DECLARE @TT table


(


ID int,


RegID int,


ZoneA varchar(5),


ZoneB varchar(5),


ZoneC varchar(5)


)


 


-- Create Sample Data


INSERT INTO @TT VALUES ( 1, 1,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 2, 2,   'A0002','B0002','C0002');


INSERT INTO @TT VALUES ( 3, 3,   'A0003','B0003','C0003');


INSERT INTO @TT VALUES ( 4, 4,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 5, 5,   'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 6, 6,   'A0006','B0006','C0006');


INSERT INTO @TT VALUES ( 7, 7,   'A0007','B0007','C0007');


INSERT INTO @TT VALUES ( 8, 8,   'A0002','B0002','C0002');


INSERT INTO @TT VALUES ( 9, 9,   'A0009','B0009','C0009');


INSERT INTO @TT VALUES ( 10, 10, 'A0010','B0010','C0010');


INSERT INTO @TT VALUES ( 11, 11, 'A0011','B0011','C0011');


INSERT INTO @TT VALUES ( 12, 12, 'A0001','B0001','C0001');


INSERT INTO @TT VALUES ( 13, 13, 'A0013','B0013','C0013');




Users with Registration ID 1, 2 and 3 had Zone data similar to the following:


1    1    A0001    B0001    C0001


2    2    A0002    B0002    C0002


3    3    A0003    B0003    C0003




So going by the pattern, a user with Registration ID 4 should have Zone data


4    4    A0004    B0004    C0004




However that was not the case, as seen in the sample data. There were a few other rows where data of a different registration id had been entered. The requirement however was not to link the data with the pattern, but to find duplicates of existing rows (zones) and mark them with a special character.

Here's how I identified the duplicate rows using CTE(Common Table Expression) and marked them with a '---'

Query


;WITH Dups


AS


(


SELECT *, ROW_NUMBER() OVER (PARTITION BY ZoneA,ZoneB,ZoneC ORDER BY ID) as ZoneData


        FROM @TT


)


SELECT ID, RegID,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneA END AS ZoneA,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneB END AS ZoneB,


        CASE WHEN ZoneData <> 1 then '--' ELSE ZoneC END AS ZoneC   


FROM Dups


ORDER BY ID




Output


ID    RegID    ZoneA    ZoneB    ZoneC


1    1    A0001    B0001    C0001


2    2    A0002    B0002    C0002


3    3    A0003    B0003    C0003


4    4    --    --    --


5    5    --    --    --


6    6    A0006    B0006    C0006


7    7    A0007    B0007    C0007


8    8    --    --    --


9    9    A0009    B0009    C0009


10    10    A0010    B0010    C0010


11    11    A0011    B0011    C0011


12    12    --    --    --


13    13    A0013    B0013    C0013



Did you like this post?
kick it on DotNetKicks.com
subscribe via rss subscribe via e-mail
print this post follow me on twitter


About The Author

Suprotim Agarwal, ASP.NET Architecture MVP works as an Architect Consultant and provides consultancy on how to design and develop Web applications.

Suprotim is also the founder and primary contributor to DevCurry, DotNetCurry and SQLServerCurry. He has also written an EBook 51 Recipes using jQuery with ASP.NET Controls.

Follow him on twitter @suprotimagarwal

 
  Feedback:

comments

1 Response to "Identifying and Replacing Duplicate Items across rows using SQL Server 2005/2008"
  1. MSSM said...
    April 27, 2009 at 3:38 AM

    I think that here is a simple "classic" solution which will work on SS2000 also:

    update @TT
    set
    zoneA='--'
    ,zoneB='--'
    ,zoneC='--'
    FROM @TT t2
    where regID>(select min(regID) from @TT t where t.zoneA=t2.zoneA and t.zoneB=t2.zoneB and t.zoneC=t2.zoneC)

    You can find a great number of tasks at SQL Exercises

 

Copyright © 2009-2014 All Rights Reserved for SQLServerCurry.com by Suprotim Agarwal | Terms and Conditions