May 01, 2009

Delete Random Records From a Table Using SQL Server 2005/2008




Sometime back I had written a simple post demonstrating How To Return Random Records From A Table.

Recently I came across a cool query by Peter where he displayed how to do it based on a category. I have modified the query to show you how random records can be displayed from a table, partitioned on a category. We will also see how to delete these records.

SAMPLE DATA


DECLARE    @ErrorLog TABLE


(


    ErrorID INT,


    ErrorCategory INT,


    ErrorDate DATETIME


)


 


INSERT    @ErrorLog


SELECT    101, 1, '20080217' UNION ALL


SELECT    102, 1, '20080217' UNION ALL


SELECT    103, 1, '20080217' UNION ALL


SELECT    104, 1, '20080217' UNION ALL


SELECT    105, 1, '20080217' UNION ALL


SELECT    106, 1, '20080217' UNION ALL


SELECT    107, 2, '20080217' UNION ALL


SELECT    108, 2, '20080217' UNION ALL


SELECT    109, 1, '20080217' UNION ALL


SELECT    110, 2, '20080217' UNION ALL


SELECT    111, 2, '20080217' UNION ALL


SELECT    112, 1, '20080217'




If you observe, there are 8 records of ErrorCategory 1 and 4 records of ErrorCategory 2. In order to display 50% random records from both Error Category 1 and 2, use this query

QUERY to Select Random Records based on a category


SELECT *


FROM   


(


    SELECT   


    ErrorID, ErrorCategory,


    ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,


    COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID


    FROM    @ErrorLog


) AS TruncTbl


WHERE    .50 * maxID >= recID




RESULTS


ErrorID    ErrorCategory    recID    maxID


102        1                1        8


106        1                2        8


104        1                3        8


103        1                4        8


108        2                1        4


107        2                2        4




Similarly, now if you have to delete random records in a table based on a category, here's how to do so

QUERY to Delete Random Records from a table based on a category


DELETE    TruncTbl


FROM   


(


    SELECT   


    ErrorID,


    ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,


    COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID


    FROM    @ErrorLog


) AS TruncTbl


WHERE    .50 * maxID >= recID




RESULTS


ErrorID    ErrorCategory    ErrorDate


103        1                2008-02-17 00:00:00.000


104        1                2008-02-17 00:00:00.000


108        2                2008-02-17 00:00:00.000


109        1                2008-02-17 00:00:00.000


111        2                2008-02-17 00:00:00.000


112        1                2008-02-17 00:00:00.000




Observe that Both ErrorCategory 1 and 2 had even set of records (8 and 4 each). If there are odd number of records, the results will differ.

Note: SQL Server 2005/2008 also has the TABLESAMPLE clause that can be used to do random sampling. I will explain this feature in my next post


Did you like this post?
kick it on DotNetKicks.com
subscribe via rss subscribe via e-mail
print this post follow me on twitter


About The Author

Suprotim Agarwal, ASP.NET Architecture MVP works as an Architect Consultant and provides consultancy on how to design and develop Web applications.

Suprotim is also the founder and primary contributor to DevCurry, DotNetCurry and SQLServerCurry. He has also written an EBook 51 Recipes using jQuery with ASP.NET Controls.

Follow him on twitter @suprotimagarwal

 
  Feedback:

comments

0 Responses to "Delete Random Records From a Table Using SQL Server 2005/2008"
 

Copyright © 2009-2014 All Rights Reserved for SQLServerCurry.com by Suprotim Agarwal | Terms and Conditions