I have rpt_date, user_id.
I am trying to find Weekly Active Users.
I get the right numbers without using windowing function:
SELECT a.start_dt, COUNT(DISTINCT uid) AS wau FROM (SELECT start_dt FROM ica.ica_calendar_ref --teradata system calendar ) a, (SELECT uid, rptg_dt FROM <tablename> ) b WHERE rptg_dt BETWEEN start_dt - 6 AND start_dt GROUP BY 1 ORDER BY 1;
I am using this query to get the same output using Windowing function, but the numbers are not correct. THis is giving me DAU and not WAU.
SELECT RPTG_DT, COUNT(DISTINCT UID) AS WAU FROM (SELECT UID, RPTG_DT, CASE WHEN SUM(UID) OVER (PARTITION BY UID ORDER BY RPTG_DT ROWS 7 PRECEDING) >= 1 THEN 1 END AS ACTIVE_OR_NOT FROM ) A WHERE ACTIVE_OR_NOT = 1 GROUP BY 1 ORDER BY 1;
What am I doing wrong...
The flaws of your statement is when you have missing days for an UID, the ROWS clause inside the window function is just counting number of rows backward : if you have a connection every 5 days, ROWS 7 PRECEDING will go up to 35 days of history.
In Ansi SQL you could use the RANGE operator inside your window function instead of ROWS... But it's not yet implemented inside TD.
You should generate a row for every day / users, so you can use ROWS phrase :
with cte_all_data (start_dt, uid) as ( select cal.start_dt , usr.uid from ica.ica_calendar_ref as cal cross join (select uid from <tablename> group by uid) as usr left join <tablename> as tab on tab.rptg_dt = cal.start_dt and tab.uid = usr.uid qualify max(tab.uid) over(partition by usr.uid order by cal.start_dt rows 7 preceding) is not null ) select start_dt , count(uid) from cte_all_data group by start_dt order by start_dt;
But I don't think it would be faster than your first solution.