Database
Fan

Recursive Row Function

Hi guys,

I have a situation as below:

a    1    0.1

a    2    0

a    3    0

b    1    0.1

b    2    0.2

b    3    0

Data is partitioned by Col1, sorted by Col2

I want to create column 4 with the rules below:

• For a given value in Col1: If row 2 col3 value < row 1 col3 value, then average it, else use row 2 col3 value

So at the end the outcome should look like this:

a   1   0.1   0.1

a   2   0      0.05

a   3   0      0.025

b   1   0.1   0.1

b   2   0.2   0.2

b   3   0      0.1

Thanks!

Kind Regards,

Raymond

4 REPLIES
Fan

Re: Recursive Row Function

Updated Rules (to make it clearer):

1. If first row of new Col1, then Col4 = Col3
2. If not first row,
• If Col3 < PrevRow_Col4 then Col4 = (Col3 + PrevRow_Col4) / 2
• else Col4 = Col3
3. Next row.
Senior Apprentice

Re: Recursive Row Function

Hi Raymond,

you probably need recursion for this:

`WITH RECURSIVE prev AS (   SELECT tab.*, col3 AS col4   FROM tab    WHERE col2 = 1      UNION ALL      SELECT t.*,      CASE WHEN t.col3 < prev.col4 THEN (t.col3 + prev.col4)/2 ELSE t.col3 END   FROM tab AS t    JOIN prev     ON t.col1= prev.col1    AND t.col2 = prev.col2+1 )SELECT * FROM prev `

Assuming you simplified the actual data, if col2 is not a sequence you must materialize a ROW_NUMBER in a Volatile Table:

`CREATE VOLATILE TABLE vt AS ( SELECT ...     ,ROW_NUMBER()       OVER (PARTITION BY col1            ORDER BY col2) AS rn   FROM tab )WITH DATAPRIMARY INDEX (col1)ON COMMIT PRESERVE ROWS`

and use rn instead of col2.

Can you add more details, how many rows per col1 and actual data for col3?

Maybe it's possible with OLAP-functions, too.

Fan

Re: Recursive Row Function

Col2 is simplified but they are dates in numeric format (ie 20150215) and they increase on a yearly basis (next input will be 20160215)

By that definition, Number of rows per col1 will be increasing every year.

I'll give the code a try. Thanks!

Senior Apprentice

Re: Recursive Row Function

You could use col2 + 10000, but you will need a FIRST_VALUE or ROW_NUMBER in your seed query, so materializing your data should be the most efficient.