I have a data set like this
I want to add the 7th column (highlighted in orange) based on first six columns. For each consecutive two rows, the criteria is to first check whether "ID", "START_DT" and "KEYWORD" are the same, if yes, then check "PAGE_ID", if "PAGE_ID" is different then previous one ("PAGE_ID" is already ordered), then "IDX" stays the same, otherwise "IDX" will be one unit larger.
If either of "ID", "START_DT", and "KEYWORD" is different from previous row, then "IDX" will be one unit larger.
I think the result can be generated by using row_number and reset with, but I'm not sure about the details, here's my code but it has syntax error, can anyone help me with it?
ROW_NUMBER() OVER (RESET WHEN (ID <> MIN(ID) OVER (ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
OR START_DT <> MIN(START_DT) OVER (ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
OR KEYWORD <> MIN(KEYWORD) OVER (ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
OR PAGE_ID = MIN(PAGE_ID) OVER (ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))) AS IDX
Thanks a lot!
You want to use RANK() with TIES DENSE, not Row_number(). RESET won't work because it will start counting again with 1! So you need to derive another value from PAGE_ID to reflect what you really want. Try something like this:
select ID, SEQNUM, START_DT, KEYWORD, PAGE_ID, ITEM_LIST
,Rank() Over(order by ID, SEQNUM, START_DT, derivedPAGE_ID With Ties Dense) as IDX
From (select ID, SEQNUM, START_DT, KEYWORD, PAGE_ID, ITEM_LIST
,Min(PAGE_ID) Over(partition by ID, START_DT, KEYWORD
order by ID, SEQNUM, START_DT, KEYWORD, PAGE_ID
rows between 1 preceding and 1 preceding) prevPAGE_ID
,(case when PrevPAGE_ID is null then PAGE_ID
when PAGE_ID = PrevPAGE_ID then SEQNUM+PAGE_ID
else PrevPAGE_ID end) derivedPAGE_ID
FROM mytable) DT
My data is sorted by 'id', 'start_dt', 'keyword', 'seqnum' and some other columns not shown in the table. I have created a new column 'RNUM' (highlighted in blue) to represent the order, which is the actual row number of each records.
Thanks for your answer, but the result is not what I expected. I didn't find the use of 'Ties Dense' in teradata online, could you elaborate it a little?
Thank you all for your guidance, I have figured out the code myself, which is
CASE WHEN PAGE_ID = 1 THEN 1 ELSE 0 END PAGE_ID_IND,
SUM(PAGE_ID_IND) OVER (ORDER BY RNUM ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ORDER BY 1, 2, 3, 4, 5;
How are PAGE_ID & RNUM calculated? It might be possible to add the result without an additional STAT-step in Explain.