In the A challenge for 2016....... accepted! post I said that I would write a post once every month that would explain very basic stuff about SQL Server. Today is the second post, in this post we will take a look at how you can return the row with the maximum value for a group.
In order to explain the code in this post, I decided to grab the top largest cities in the world as the data. You can find the data here List of cities proper by population
Below is the SQL code to create a table and also the data to populate this table
create table #Temp( Rank int not null, City varchar(100) not null, Population int not null, Density int not null, Country varchar(100)) INSERT #Temp SELECT 1,'Shanghai',24256800,3826,'China' union all SELECT 2,'Karachi',23500000,6663,'Pakistan' union all SELECT 3,'Beijing',21516000,1311,'China' union all SELECT 4,'Lagos',21324000,18206,'Nigeria' union all SELECT 5,'Delhi',16787941,11320,'India' union all SELECT 6,'Tianjin',15200000,1293,'China' union all SELECT 7,'Istanbul',14160467,2593,'Turkey' union all SELECT 8,'Tokyo',13297629,6075,'Japan' union all SELECT 9,'Guangzhou',13080500,1759,'China' union all SELECT 10,'Mumbai',12478447,20680,'India' union all SELECT 11,'Moscow',12197596,4859,'Russia' union all SELECT 12,'São Paulo',11895893,7821,'Brazil' union all SELECT 13,'Shenzhen',10467400,5256,'China' union all SELECT 14,'Jakarta',10075310,15171,'Indonesia' union all SELECT 15,'Seoul',10048850,17134,'South Korea'
Now that we have the data in our table, let's run a simple query to return the row with the most populous city in the world. You can accomplish this in SQL Server by using the TOP operator and ordering the data by population in descending order.
SELECT TOP 1 * FROM #Temp ORDER BY population DESC
Here is the output from that query
Rank City Population Density Country 1 Shanghai 24256800 3826 China
What we want to do next is return one row the most populous city per country.
So let's take it a step at a time, the query below will give you the maximum population by country
SELECT MAX(population) as BiggestPopulation,Country FROM #Temp GROUP BY Country ORDER BY BiggestPopulation DESC
Here is the output
BiggestPopulation Country 24256800 China 23500000 Pakistan 21324000 Nigeria 16787941 India 14160467 Turkey 13297629 Japan 12197596 Russia 11895893 Brazil 10075310 Indonesia 10048850 South Korea
What we can do is join the query we have above with the table itself on the country and population columns. The query will look like this
SELECT T.* FROM( SELECT MAX(population) as BiggestPopulation,Country FROM #Temp GROUP BY Country) X JOIN #Temp t on x.BiggestPopulation = t.Population AND t.Country = x.Country ORDER BY t.Population DESC
Here is the output
+------+-----------+------------+---------+-------------+ | Rank | City | Population | Density | Country | +------+-----------+------------+---------+-------------+ | 1 | Shanghai | 24256800 | 3826 | China | +------+-----------+------------+---------+-------------+ | 2 | Karachi | 23500000 | 6663 | Pakistan | +------+-----------+------------+---------+-------------+ | 4 | Lagos | 21324000 | 18206 | Nigeria | +------+-----------+------------+---------+-------------+ | 5 | Delhi | 16787941 | 11320 | India | +------+-----------+------------+---------+-------------+ | 7 | Istanbul | 14160467 | 2593 | Turkey | +------+-----------+------------+---------+-------------+ | 8 | Tokyo | 13297629 | 6075 | Japan | +------+-----------+------------+---------+-------------+ | 11 | Moscow | 12197596 | 4859 | Russia | +------+-----------+------------+---------+-------------+ | 12 | São Paulo | 11895893 | 7821 | Brazil | +------+-----------+------------+---------+-------------+ | 14 | Jakarta | 10075310 | 15171 | Indonesia | +------+-----------+------------+---------+-------------+ | 15 | Seoul | 10048850 | 17134 | South Korea | +------+-----------+------------+---------+-------------+
SQL Server 2005 introduced a bunch of ranking/windowing functions which make writing these queries a little simpler. We are going to use ROW_NUMBER for this post. Here is what the query will look like
SELECT ROW_NUMBER() OVER (PARTITION By [what do you want it grouped/partitioned by?] ORDER BY [what do you want it ordered by?]) AS Row,*
FROM Table
In our case, we want to partition the result set by country, we also want to order each partition by population descending. The query looks like this
SELECT ROW_NUMBER() OVER (PARTITION By Country ORDER BY population DESC) AS Row,* FROM #Temp ORDER BY Row,Population DESC
Here is what that query returns
+-----+------+-----------+------------+---------+-------------+ | Row | Rank | City | Population | Density | Country | +-----+------+-----------+------------+---------+-------------+ | 1 | 1 | Shanghai | 24256800 | 3826 | China | +-----+------+-----------+------------+---------+-------------+ | 1 | 2 | Karachi | 23500000 | 6663 | Pakistan | +-----+------+-----------+------------+---------+-------------+ | 1 | 4 | Lagos | 21324000 | 18206 | Nigeria | +-----+------+-----------+------------+---------+-------------+ | 1 | 5 | Delhi | 16787941 | 11320 | India | +-----+------+-----------+------------+---------+-------------+ | 1 | 7 | Istanbul | 14160467 | 2593 | Turkey | +-----+------+-----------+------------+---------+-------------+ | 1 | 8 | Tokyo | 13297629 | 6075 | Japan | +-----+------+-----------+------------+---------+-------------+ | 1 | 11 | Moscow | 12197596 | 4859 | Russia | +-----+------+-----------+------------+---------+-------------+ | 1 | 12 | São Paulo | 11895893 | 7821 | Brazil | +-----+------+-----------+------------+---------+-------------+ | 1 | 14 | Jakarta | 10075310 | 15171 | Indonesia | +-----+------+-----------+------------+---------+-------------+ | 1 | 15 | Seoul | 10048850 | 17134 | South Korea | +-----+------+-----------+------------+---------+-------------+ | 2 | 3 | Beijing | 21516000 | 1311 | China | +-----+------+-----------+------------+---------+-------------+ | 2 | 10 | Mumbai | 12478447 | 20680 | India | +-----+------+-----------+------------+---------+-------------+ | 3 | 6 | Tianjin | 15200000 | 1293 | China | +-----+------+-----------+------------+---------+-------------+ | 4 | 9 | Guangzhou | 13080500 | 1759 | China | +-----+------+-----------+------------+---------+-------------+ | 5 | 13 | Shenzhen | 10467400 | 5256 | China | +-----+------+-----------+------------+---------+-------------+
What we really want is the top 10 rows. You might think that you can just add WHERE Row = 1 to the query. This is not allowed since the WHERE clause gets executed before the ROW_Number function as part of the SELECT. What you can do is use a common table expression and then specify WHERE Row = 1 when selecting from the CTE
Here is what it looks like
;WITH CTE AS( SELECT ROW_NUMBER() OVER (PARTITION By Country ORDER BY population DESC) AS Row,* FROM #Temp) SELECT Rank, City,Population,Density,Country FROM CTE WHERE Row = 1 ORDER BY Population DESC
And here is our output, for each country we have only one row, you can also see that 4 cities from the top 10 are not in this list, those cities are in either China or India
+------+-----------+------------+---------+-------------+--+ | Rank | City | Population | Density | Country | | +------+-----------+------------+---------+-------------+--+ | 1 | Shanghai | 24256800 | 3826 | China | | +------+-----------+------------+---------+-------------+--+ | 2 | Karachi | 23500000 | 6663 | Pakistan | | +------+-----------+------------+---------+-------------+--+ | 4 | Lagos | 21324000 | 18206 | Nigeria | | +------+-----------+------------+---------+-------------+--+ | 5 | Delhi | 16787941 | 11320 | India | | +------+-----------+------------+---------+-------------+--+ | 7 | Istanbul | 14160467 | 2593 | Turkey | | +------+-----------+------------+---------+-------------+--+ | 8 | Tokyo | 13297629 | 6075 | Japan | | +------+-----------+------------+---------+-------------+--+ | 11 | Moscow | 12197596 | 4859 | Russia | | +------+-----------+------------+---------+-------------+--+ | 12 | São Paulo | 11895893 | 7821 | Brazil | | +------+-----------+------------+---------+-------------+--+ | 14 | Jakarta | 10075310 | 15171 | Indonesia | | +------+-----------+------------+---------+-------------+--+ | 15 | Seoul | 10048850 | 17134 | South Korea | | +------+-----------+------------+---------+-------------+--+
That is all for this month, see you next month. In the meantime, make sure to explore windowing/ranking functions as well as Common Table Expressions