SQL Server Code,Tips and Tricks, Performance Tuning: SQL Challenge: Random Grouping Of Data

Thursday, October 05, 2006

SQL Challenge: Random Grouping Of Data

After Omnibuzz's challenge this morning I decided to come up with a challenge of my own

Let's say you have a table with data (what else would be in the table bananas? )The data looks like this
sony 4
sony 6
toshiba 1
toshiba 3
mitsubishi 2
mitsubishi 5

You want to return the results grouped together by company name but in random order
So for example the first resultset is

sony 4
sony 6
toshiba 1
toshiba 3
mitsubishi 2
mitsubishi 5

you highlight the query hit F5 and the next result is

toshiba 1
toshiba 3
sony 4
sony 6
mitsubishi 2
mitsubishi 5

See where I am going? The resultset is random but the company names are grouped together

Here is the deal no CTE or windowing functions (RANK, DENSE_RANK, ROWNUMBER or NTILE) you know what forget about SQL Server 2005, this has to be able to run on SQL Server 2000
Also no temp tables or table variables (added after first response ;-) )

I have 2 solutions to this, I will post the solutions some time tomorrow
Below is DDL + Insert script

Enjoy

CREATE TABLE #Testcompanies (
Name VARCHAR(50),
ID INT)

INSERT INTO #Testcompanies
SELECT 'toshiba' ,1
UNION ALL
SELECT 'mitsubishi', 2
UNION ALL
SELECT 'toshiba', 3
UNION ALL
SELECT 'sony', 4
UNION ALL
SELECT 'mitsubishi', 5
UNION ALL
SELECT 'sony', 6

Here are the 2 solutions I had in mind

--Query using a sub query and NEWID()
SELECT T.*
FROM #Testcompanies T
JOIN (SELECT DISTINCT TOP 100 PERCENT Name,
NEWID() AS GroupedOrder
FROM #Testcompanies
GROUP BY Name
ORDER BY NEWID()) Z
ON T.Name = Z.Name
ORDER BY Z.GroupedOrder

--Query using RAND()
DECLARE @R FLOAT
SET @R = RAND()

SELECT TOP 100 PERCENT *
FROM #TESTCOMPANIES
ORDER BY RAND(@R * CHECKSUM(NAME))

Does anyone else have a different solution than these two or the two in the comments?

8 comments:

Anonymous said...: Two solutions I thought of:

-- Using a temporary table
SELECT name, NEWID() sortBy INTO #OrderBy FROM
(SELECT name FROM TestCompanies GROUP by name) A

SELECT A.name, A.id FROM TestCompanies A INNER JOIN
#OrderBY B ON A.name=B.name
ORDER BY sortBy, A.id

DROP TABLE #OrderBy

-- Or using a table variable
DECLARE @OrderBy TABLE (name varchar(50), sortBy uniqueidentifier)

INSERT INTO @OrderBy (name, sortBy)
SELECT name, NEWID() FROM (SELECT name FROM TestCompanies GROUP by name) A

SELECT A.name, A.id FROM TestCompanies A INNER JOIN
@OrderBY B ON A.name=B.name
ORDER BY sortBy, A.id; 3:58 AM
Denis said...: Okay fair enough, but now try without temp tables or table variables; 4:09 AM
Omnibuzz said...: Maybe this.. Looks too ugly to me though :)

select a.name,a.id
from
#Testcompanies a
ORDER BY cast(abs(checksum(name))*rand() as int) % (select count(*) from #Testcompanies),name,
newid(); 4:22 AM
Denis said...: I have posted the 2 solutions I had in mind; 6:13 AM
Omnibuzz said...: Hi Denis,
The second solution is really cool.

But the first one might not work.
And its an undocumented behavior.

Its actually based on the query execution plan. The GroupedOrder may be evaluated at the time of ordering of the final result set.
The same advice I got from Hugo when I came up with the concatenation of column in my blog :)

Just check it out..

-Best regards
Omnibuzz; 9:02 AM
Anonymous said...: I should've come across this post earlier, but anyway... this is probably similar to ones you already have. It uses the cheat of having top with a large number to materialise the derived table. Without it, the optimiser applies newid() to each row of 6, not of 3.

select c.*
from
(
select top 2147483647 name, newid() as order1
from
(select name
from #TestCompanies
group by name) c1
) c2
join
#TestCompanies c
on c.name = c2.name
order by c2.order1, c.id; 10:52 PM
Anonymous said...: Actually, try this one - this one is totally legit - not using any workarounds at all:

select c.*
from
(
select name, max(checksum(newid())) as order1
from #TestCompanies
group by name
) c2
join
#TestCompanies c
on c.name = c2.name
order by c2.order1, c.id; 10:54 PM
Anonymous said...: I came up with a very similar solution to the second rob f solution:

SELECT T.*
FROM (SELECT DISTINCT [name], SUBSTRING([name], CAST(rand() * 10 AS INT), 1) AS 'OrderCol'
FROM #Testcompanies
) I
JOIN #Testcompanies T
ON T.Name = I.Name
ORDER BY I.OrderCol

Regards

FTD; 1:33 PM