Delete Random Records From a Table Using SQL Server 2005/2008

Sometime back I had written a simple post demonstrating How To Return Random Records From A Table.

Recently I came across a cool query by Peter where he displayed how to do it based on a category. I have modified the query to show you how random records can be displayed from a table, partitioned on a category. We will also see how to delete these records.

SAMPLE DATA


DECLARE    @ErrorLog TABLE


(


    ErrorID INT,


    ErrorCategory INT,


    ErrorDate DATETIME


)


 


INSERT    @ErrorLog


SELECT    101, 1, '20080217' UNION ALL


SELECT    102, 1, '20080217' UNION ALL


SELECT    103, 1, '20080217' UNION ALL


SELECT    104, 1, '20080217' UNION ALL


SELECT    105, 1, '20080217' UNION ALL


SELECT    106, 1, '20080217' UNION ALL


SELECT    107, 2, '20080217' UNION ALL


SELECT    108, 2, '20080217' UNION ALL


SELECT    109, 1, '20080217' UNION ALL


SELECT    110, 2, '20080217' UNION ALL


SELECT    111, 2, '20080217' UNION ALL


SELECT    112, 1, '20080217'




If you observe, there are 8 records of ErrorCategory 1 and 4 records of ErrorCategory 2. In order to display 50% random records from both Error Category 1 and 2, use this query

QUERY to Select Random Records based on a category


SELECT *


FROM   


(


    SELECT   


    ErrorID, ErrorCategory,


    ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,


    COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID


    FROM    @ErrorLog


) AS TruncTbl


WHERE    .50 * maxID >= recID




RESULTS


ErrorID    ErrorCategory    recID    maxID


102        1                1        8


106        1                2        8


104        1                3        8


103        1                4        8


108        2                1        4


107        2                2        4




Similarly, now if you have to delete random records in a table based on a category, here's how to do so

QUERY to Delete Random Records from a table based on a category


DELETE    TruncTbl


FROM   


(


    SELECT   


    ErrorID,


    ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,


    COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID


    FROM    @ErrorLog


) AS TruncTbl


WHERE    .50 * maxID >= recID




RESULTS


ErrorID    ErrorCategory    ErrorDate


103        1                2008-02-17 00:00:00.000


104        1                2008-02-17 00:00:00.000


108        2                2008-02-17 00:00:00.000


109        1                2008-02-17 00:00:00.000


111        2                2008-02-17 00:00:00.000


112        1                2008-02-17 00:00:00.000




Observe that Both ErrorCategory 1 and 2 had even set of records (8 and 4 each). If there are odd number of records, the results will differ.

Note: SQL Server 2005/2008 also has the TABLESAMPLE clause that can be used to do random sampling. I will explain this feature in my next post


About The Author

Suprotim Agarwal, MCSD, MCAD, MCDBA, MCSE, is the founder of DotNetCurry, DNC Magazine for Developers, SQLServerCurry and DevCurry. He has also authored a couple of books 51 Recipes using jQuery with ASP.NET Controls and a new one recently at The Absolutely Awesome jQuery CookBook.

Suprotim has received the prestigous Microsoft MVP award for nine times in a row now. In a professional capacity, he is the CEO of A2Z Knowledge Visuals Pvt Ltd, a digital group that represents premium web sites and digital publications comprising of Professional web, windows, mobile and cloud developers, technical managers, and architects.

Get in touch with him on Twitter @suprotimagarwal, LinkedIn or befriend him on Facebook

No comments: