Recursive Row Function

Database

Recursive Row Function

Hi guys,

I have a situation as below:

a    1    0.1

a    2    0

a    3    0

b    1    0.1

b    2    0.2

b    3    0

Data is partitioned by Col1, sorted by Col2

I want to create column 4 with the rules below:

  • For a given value in Col1: If row 2 col3 value < row 1 col3 value, then average it, else use row 2 col3 value

So at the end the outcome should look like this:

a   1   0.1   0.1

a   2   0      0.05

a   3   0      0.025

b   1   0.1   0.1

b   2   0.2   0.2

b   3   0      0.1

Thanks!

Kind Regards,

Raymond

4 REPLIES

Re: Recursive Row Function

Updated Rules (to make it clearer):

  1. If first row of new Col1, then Col4 = Col3
  2. If not first row,
    • If Col3 < PrevRow_Col4 then Col4 = (Col3 + PrevRow_Col4) / 2
    • else Col4 = Col3
  3. Next row.
Junior Contributor

Re: Recursive Row Function

Hi Raymond,

you probably need recursion for this:

WITH RECURSIVE prev AS
(
SELECT tab.*, col3 AS col4
FROM tab
WHERE col2 = 1

UNION ALL

SELECT t.*,
CASE WHEN t.col3 < prev.col4 THEN (t.col3 + prev.col4)/2 ELSE t.col3 END
FROM tab AS t
JOIN prev
ON t.col1= prev.col1
AND t.col2 = prev.col2+1
)
SELECT *
FROM prev

Assuming you simplified the actual data, if col2 is not a sequence you must materialize a ROW_NUMBER in a Volatile Table:

CREATE VOLATILE TABLE vt AS
( SELECT ...
,ROW_NUMBER()
OVER (PARTITION BY col1
ORDER BY col2) AS rn
FROM tab
)
WITH DATA
PRIMARY INDEX (col1)
ON COMMIT PRESERVE ROWS

and use rn instead of col2.

Can you add more details, how many rows per col1 and actual data for col3?

Maybe it's possible with OLAP-functions, too.

Re: Recursive Row Function

Col2 is simplified but they are dates in numeric format (ie 20150215) and they increase on a yearly basis (next input will be 20160215)

By that definition, Number of rows per col1 will be increasing every year.

I'll give the code a try. Thanks!

Junior Contributor

Re: Recursive Row Function

You could use col2 + 10000, but you will need a FIRST_VALUE or ROW_NUMBER in your seed query, so materializing your data should be the most efficient.