tag:blogger.com,1999:blog-495177919198719500.post1876044819603740685..comments2023-12-27T20:52:29.483-08:00Comments on Microsoft Sql Server Tutorials: SQL Server Delete Duplicate RowsSuprotim Agarwalhttp://www.blogger.com/profile/08349831623922214390noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-495177919198719500.post-39050905041883051402016-08-01T20:46:33.746-07:002016-08-01T20:46:33.746-07:00Thanks for your solution John. Definitely better t...Thanks for your solution John. Definitely better than the original proposed solution.Suprotim Agarwalhttps://www.blogger.com/profile/08349831623922214390noreply@blogger.comtag:blogger.com,1999:blog-495177919198719500.post-56404719081340323122016-08-01T01:42:27.615-07:002016-08-01T01:42:27.615-07:00Scenario 1
There's no need for the initial SEL...<b>Scenario 1</b><br />There's no need for the initial SELECT INTO for the temp table. Just do it like this:<br />SELECT *<br />FROM customers1<br />GROUP BY<br /> Custid<br />, CustName<br />, CustCity<br />, Passport_Number<br />HAVING COUNT(*) > 1<br /><br />But this method is expensive, involving an INSERT, a DELETE and another INSERT. It hasn't been necessary to do it like this since the days of SQL Server 2000. It also has the fatal flaw that it won't work if you have duplicate IDs with different data. Add this row to your table and see what happens with your technique:<br />INSERT INTO customers1 VALUES (1, 'Betty', 'Huddersfield', 'J467C25')<br /><br />Here is a method that is cheaper and more general:<br />WITH Numbered AS (<br /> SELECT<br /> ROW_NUMBER() OVER (<br /> PARTITION BY Custid,CustName, CustCity, Passport_Number<br /> ORDER BY (SELECT NULL)) AS RowNo<br /> , Custid<br /> , CustName<br /> , CustCity<br /> , Passport_Number<br /> FROM customers1<br /> )<br />DELETE FROM Numbered<br />WHERE RowNo > 1<br /><br /><b>Scenario 2</b><br />This works as far as it goes, although your method will tolerate two people having the same passport number, which I don't imagine is a good thing. Of course, if you had two people with different names but the same passport number, you'd have to make a decision on which one to delete, something you probably can't do in code, so let's stick with the scenario as you describe it for the sake of simplicity. Here is a more concise way of doing it. You can use ASC or DESC in the ORDER BY, depending on whether you want to keep the highest or lowest ID:<br />WITH Numbered AS (<br /> SELECT<br /> ROW_NUMBER() OVER (<br /> PARTITION BY CustName, CustCity, Passport_Number<br /> ORDER BY Custid ASC) RowNo<br /> , Custid<br /> , CustName<br /> , CustCity<br /> , Passport_Number<br /> FROM customers2<br /> )<br />DELETE FROM Numbered<br />WHERE RowNo > 1<br /><br /><a rel="nofollow">John</a>Anonymoushttps://www.blogger.com/profile/17575063041464293930noreply@blogger.com