Weekly Active Users using Windowing function

Database
Visitor

Weekly Active Users using Windowing function

I have rpt_date, user_id.

I am trying to find Weekly Active Users.

 

I get the right numbers without using windowing function:

 

SELECT a.start_dt, COUNT(DISTINCT uid) AS wau FROM (SELECT start_dt FROM ica.ica_calendar_ref --teradata system calendar ) a, (SELECT uid, rptg_dt FROM <tablename> ) b WHERE rptg_dt BETWEEN start_dt - 6 AND start_dt GROUP BY 1 ORDER BY 1;

 

I am using this query to get the same output using Windowing function, but the numbers are not correct. THis is giving me DAU and not WAU. 

SELECT RPTG_DT, COUNT(DISTINCT UID) AS WAU FROM (SELECT UID, RPTG_DT, CASE WHEN SUM(UID) OVER (PARTITION BY UID ORDER BY RPTG_DT ROWS 7 PRECEDING) >= 1 THEN 1 END AS ACTIVE_OR_NOT FROM ) A WHERE ACTIVE_OR_NOT = 1 GROUP BY 1 ORDER BY 1;

What am I doing wrong...

Tags (2)
1 REPLY
Teradata Employee

Re: Weekly Active Users using Windowing function

The flaws of your statement is when you have missing days for an UID, the ROWS clause inside the window function is just counting number of rows backward : if you have a connection every 5 days, ROWS 7 PRECEDING will go up to 35 days of history.

 

In Ansi SQL you could use the RANGE operator inside your window function instead of ROWS... But it's not yet implemented inside TD.

 

You should generate a row for every day / users, so you can use ROWS phrase  :

with cte_all_data (start_dt, uid) as
(
    select cal.start_dt
         , usr.uid
      from ica.ica_calendar_ref                       as cal
cross join (select uid from <tablename> group by uid) as usr
 left join <tablename>                                as tab  on tab.rptg_dt = cal.start_dt
                                                             and tab.uid     = usr.uid
   qualify max(tab.uid) over(partition by usr.uid order by cal.start_dt rows 7 preceding) is not null
)
  select start_dt
       , count(uid)
    from cte_all_data
group by start_dt
order by start_dt;

But I don't think it would be faster than your first solution.