Missing Functions: DENSE_RANK

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Junior Contributor

Some of the SQL:2008 analytical functions are not implemented in Teradata, but most of them can be re-written using existing features.

In this post I will cover the ranking algorithms including DENSE_RANK.

Ranking definitions

The Wikipedia-article on Ranking lists five algorithms.

Of course all of them return the same output for unique data, but non-unique values will produce different results with duplicate numbers and gaps in the ranking.

Standard SQL proposes syntax for three out of those five variations, while Teradata only implements two of them. In this post I will focus on the missing syntax assuming you're familiar with the existing functions RANK and ROW_NUMBER.

Wikipedia's #4: Ordinal ranking "1234"

A sequential number.

No duplicate values & no gaps, but non-deterministic, i.e. same row might get different ranking number when query runs a second time .

Implemented in SQL:2008 and Teradata as ROW_NUMBER.

Wikipedia's #1: Standard competition ranking "1224"

One plus the number of rows with a value less than the current value.

Duplicate values & gaps.

Implemented in SQL:2008 and Teradata as RANK.

Wikipedia's #2: Modified competition ranking "1334"

The number of rows with a value less than or equal to the current value.

Duplicate values & gaps.

There's no special syntax for this one in SQL:2008, but it's straightforward:

COUNT(*) OVER (PARTITION BY part_col ORDER BY data_col RANGE UNBOUNDED PRECEDING)

Unfortunately the RANGE syntax in the OLAP window definition is not implemented in Teradata. But "number of rows with a value less than or equal to the current value" can also be expressed using a RANK/COUNT combination:

RANK() OVER (PARTITION BY part_col ORDER BY data_col) - 1
+ COUNT(*) OVER (PARTITION BY part_col, data_col)

The previous calculation returns the expected result, but explain shows two "STAT FUNCTION" steps.

The most efficient formula with just a single step is based on a reversed order in the RANK:

COUNT(*) OVER (PARTITION BY part_col)
- RANK() OVER (PARTITION BY part_col ORDER BY data_col DESC) + 1

Wikipedia's #5: Fractional ranking "1 2.5 2.5 4"

One plus the number of rows with a value less than the current value plus half the number of items equal to it.

Duplicate values & gaps.

This is also easy to translate into SQL, but the optimizer has to add another STATS step again, which will double runtime:

RANK() OVER (PARTITION BY part_col ORDER BY data_col)
+ (0.5 * (COUNT(*) OVER (PARTITION BY part_col, data_col) - 1))

Wikipedia's #3: Dense ranking "1223"

The number of distinct values less than or equal to the current value.

Duplicate values, but no gaps.

It's the SQL:2008 DENSE_RANK function, which can be expressed in different ways in Teradata, but all of them are much slower than the built-in RANK:

SELECT
t1.part_col
,t1.data_col
,dt.rnk AS "DENSE_RANK"
FROM tab AS t1
JOIN
(
SELECT
part_col
,data_col
,RANK() OVER (PARTITION BY part_col ORDER BY data_col) AS rnk
FROM tab
GROUP BY 1,2
) AS dt
ON t1.part_col = dt.part_col
AND t1.data_col = dt.data_col;

This query has to access the same table or view twice, which is a big disadvantage if it there are joins or complex conditions included.

Following solutions avoid double access, but are based on nested OLAP steps:

SELECT
part_col
,data_col
--,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented
,SUM(CASE WHEN data_col = prev_data_col THEN 0 ELSE 1 END)
OVER (PARTITION BY part_col
ORDER BY data_col, prev_data_col
ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK"
FROM
(
SELECT
part_col
,data_col
,MIN(data_col)
OVER (PARTITION BY part_col
ORDER BY data_col
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_data_col
FROM tab
) AS dt;

SELECT
part_col
,data_col
--,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented
,SUM(CASE WHEN rowno = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY part_col
ORDER BY data_col, rowno
ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK"
FROM
(
SELECT
part_col
,data_col
,ROW_NUMBER()
OVER (PARTITION BY part_col, data_col
ORDER BY data_col) AS rowno
FROM tab
) AS dt;

If another analytical function must be added to the query you should check if it shares the same PARTITION and ORDER definition and choose the matching version.

Btw, in TD13.10 there's a new "Window Aggregate UDF" feature which allows to create your own OLAP function and the example source code in the manuals is a DENSE_RANK ;-)


The next post in this small series on missing analytical functions will discuss additional SQL:2008 functions based on relative rankings:

CUME_DIST and NTILE.

9 Comments
Enthusiast
Are you sure about the first sql example under Wikipedia's #3: Dense ranking "1223"?
Junior Contributor
Of course, just try it :-)

Dieter
Teradata Employee
Great timing! I was just looking for dense_rank for a porting!
Thanks
Fan
Nice work Dieter. Very informative.
Handy Info, thanks Dieter
Thanks Dieter, It helps me in my work
KVB
Enthusiast

Great work.It helped me a lot in my work area.

Fan

Hi Dieter,

It's a great effort towards simulating common functions available in other databases but not in Teradata.

However, I had a question regarding the two approaches mentioned above. Is the "partition by part_col" needed in the SUM function in the outer query as this will only give the ranking within the group identified by the column given in the partition clause?

Enthusiast
-- Another Example to Get nth highest salaried employee
SELECT empNo
,salary
,prev_value
,SUM(CASE WHEN prev_value = salary THEN 0 ELSE 1 END) OVER (
ORDER BY salary DESC
,prev_value DESC ROWS UNBOUNDED Preceding
) AS "DENSE_RANK"
FROM (
SELECT empNo
,salary
,min(salary) OVER (
ORDER BY salary DESC ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING
) prev_value
FROM employee2
) AS dt;