Analytical (Window) Functions in SQL - Ranking Functions
Window functions perform aggregate and ranking functions over a particular set of rows (window) that are related to each other. However, unlike regular aggregate functions, use of a window function doesn't cause rows to become grouped - they retain their separate identities. Let's delve into various window functions and their applications using the following dataset - sample menu from the restaurant 'Savory Delights'.
If we wanted to get the price of the most expensive dish on the menu, we can do so by using the following query: We get a result of 329/-
SELECT MAX(PRICE) FROM savory_delights_menu
If we wanted to get the price of most expensive dish per category, we can do so using GROUP BY.
SELECT CATEGORY, MAX(PRICE) AS MAX_PRICE FROM savory_delights_menu GROUP BY CATEGORY
Now, if we needed to arrange the table according to prices of dishes in a descending order but within each category, how would we do it? Aggregate functions simply fail to maintain the original identity of the individual rows by grouping rows together to perform calculations on it. This is where window functions really shine. Now let us see how we'd answer the above question using window functions and explore the syntax of them.
SELECT *, ROW_NUMBER() OVER(PARTITION BY CATEGORY ORDER BY PRICE DESC) AS ROW_NUM FROM savory_delights_menu
OVER() designates the query as a window function.
We can use PARTITION BY clause with OVER() to specify the column on which we need to perform aggregation.
The function ROW_NUMBER() here assigns a unique sequential number starting with 1 to each row in every partition.
ORDER BY clause sorts the resulting partitions according to the descending value of the prices.
Let's explore more window functions and their applications..
RANK() is a window function that assigns rank to each row in the partition of a result set. The ranking is determined by ORDER BY expression in the OVER clause. If there's also a PARTITION BY, rankings are reset for every partition. If we wanted to rank the prices in each partition, we can do so using the following query:
SELECT *, RANK() OVER(PARTITION BY CATEGORY ORDER BY PRICE DESC) AS RANK_NUM FROM savory_delights_menu
If we observe, second and third dishes have same rank '2' because their prices are same and it is the 2nd most expensive price in the 'Continental' section. The next dish, however, is priced at 219/- making it the 3rd most expensive Continental dish. However, its rank is shown as '4' instead of '3'. This is because in case of RANK() function, rows with same value are assigned the same rank, but it also skips the subsequent rank number(s) of additional rows with the same value. So in this case, if there's three dishes priced at 249/- instead of two, the next priced dish would start at '5' instead of '4'.
While DENSE_RANK() function is similar to RANK(), there is one significant difference - DENSE_RANK() doesn't skip rows while ranking. If we consider this function to the above situation, no matter how many Continental dishes are priced at 249/-, the next priced dish would start at '3'.
SELECT *, DENSE_RANK() OVER(PARTITION BY CATEGORY ORDER BY PRICE DESC) AS RANK_NUM FROM savory_delights_menu
Due to DENSE_RANK's characteristic of not skipping ranks, we can use the function in applications such as extracting top 2 expensive dishes in each category or the top 2 cheapest ones in each category.
SELECT * FROM ( SELECT *, DENSE_RANK() OVER(PARTITION BY CATEGORY ORDER BY PRICE DESC) AS RANK_NUM FROM savory_delights_menu ) WHERE RANK_NUM <= 2