May 03, 2009

Select Random Records From a Table using TableSample in SQL Server 2005/2008

In my previous post, I had explained how to use the NEWID() to Delete Random Records From a Table Using SQL Server 2005/2008

In this post, I will show how to select random records using the TABLESAMPLE clause and use it in different ways. The TABLESAMPLE clause takes a parameter that can be a percent or a sample number representing rows. This clause samples a percentage of
pages randomly. You will get a different result set each time you run a query with the TABLESAMPLE clause. That means that even if you specify to return 50 rows, you will only get an approximate number of rows, that can be less or more than 50 due to the random sampling logic.

Select X% of Records

Returns different set of rows each time the query is executed


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 PERCENT);




Select X% of Records with same random sampling multiple times

Specify REPEATABLE with seed. REPEATABLE clause returns same set of rows every time that it is executed


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 PERCENT)


REPEATABLE (100)




Specify Number of Records instead of %(Percentage)


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 ROWS)




Important Points about TABLESAMPLE:

1. If you have upgraded from SQL 2000 to SQL Server 2005, make sure that the compatibility level of the database is set to at least 90.

2. TABLESAMPLE cannot be used on derived tables, OPENXML, Inline Table-Valued functions, linked server tables or a View.


Read more on TABLESAMPLE over here http://msdn.microsoft.com/en-us/library/ms189108(SQL.90).aspx


About The Author

Suprotim Agarwal, ASP.NET Architecture MVP works as an Architect Consultant and provides consultancy on how to design and develop Web applications.

Suprotim is also the founder and primary contributor to DevCurry, DotNetCurry and SQLServerCurry. He has also written an EBook 51 Recipes using jQuery with ASP.NET Controls.

Follow him on twitter @suprotimagarwal

3 comments:

IGotYourDotnet said...

how can you pull random records out but not duplicate any? Meaning,I have records that can have the same name but different dates or states such as
John Smith 1/2/2010 DE
John Smith 5/14/2008 NJ

how can I only show one name within my random so my users don't get confused?

Madhivanan said...

IGotYourDotnet,

One possible methods is

select name,min(date) as date from table
tablesample (50 percent)
group by name

Also note that tablesample doesn't work well for small set of data
If all data of table stored in a single page, either it returns all rows or no rows

Run this to understand

SELECT
city,min(customerid) FROM Northwind..Customers
group by city It returns 69 rows

Now

SELECT city,min(customerid) FROM Northwind..Customers
TABLESAMPLE (50 percent)
group by city

will return all 69 rows or no rows

So better approach is to use top operator with newid()

SELECT
top 50 percent city,min(customerid) FROM Northwind..Customers
group by city
order by newid()

Libo said...

Excellent tip. I vote for the top operator with newid()