Complex analytical query help

Database
Enthusiast

Complex analytical query help

Dear Folks,

I have a table with information like subscriber added/removed particular feature along with the date it added or removed. Also if he is active or canceled his subscription.

CREATE TABLE subscriber_feature
(SUB_ID INTEGER, -- subscriber id
feature_status VARCHAR(20) -- feature status change ex - 'preminum added' / 'preminum removed'
feature_status_dttm TIMESTAMP(6), --feature added/removed dttm
sub_status_change_dttm TIMESTAMP(6), -sub status change dttm
sub_type_cd CHAR(5), -- ex value 'prep'
Sub_stat_cd CHAR(1)) - ex value 'active', 'cancel' etc
PRIMARY INDEX(sub_id);

I have to write SQL to give report in following format.

        Jan-12     Feb-12    Mar-12   .....Dec-14
Jan-12 100 80 60 .....
Feb-12 0 100 80 ....
Mar-12 0 0 100
.
.
Dec-14
total
premium- 100 180 240 ............
Feature

First row is all Jan-12 data. for Jan-12, how many subscribed for the feature, then how many stays in it by every month end.

(Ex) 100 subscribed Jan-12 and  15 unsubscribed for premium feature and 5 canceled there subscription by feb-12. So first row  should have 100 for Jan-12 and 80(100 - 15 - 5) for Feb-12 and so on for all months.

Second row is for Feb-12. Same like jan. Last row should have the total count of subscribers subscribed for preminum feature by all month end. 

We need to calculate feature added and feature removed using both feature_status and feature_status_dttm. subscription cancel by Sub_stat_cd & sub_status_change_dttm.

Looks very very complex to me. Could some one help me with this? If we can store the result in volatile table and derive the report using another query also fine. Please help.

Thanks.

8 REPLIES
Junior Contributor

Re: Complex analytical query help

Can you elaborate on the rules how feature_status/feature_status_dttm and Sub_stat_cd/sub_status_change_dttm are related and add some sample inserts?

What's you TD release?

I would assume some combination of EXPAND ON/TD_SEQUENCED_COUNT plus GROUPING SETS will be able to return this result.

Enthusiast

Re: Complex analytical query help

Hi Dieter,

If feature added or removed both will be stored in feature_status. For both the action the event time is stored in  Feature_status_dttm. Using the feature_status and Feature_status_dttm only i can determine when feature is added/removed.

If sub_stat_cd is 'cancel' then  subscription cancel time will be stored in sub_status_change_dttm. Using this i can determine this subscriber is still with us or not.

Based on both i can determine how many subscribed for particular feature in month 1 and how many still using this feature for 3 years month by month.

Sample inserts below

insert into subscriber_feature values(1000, 'premium added', current_timestamp - interval '2' month, current_timestamp - interval '90' day,'prep', 'active');

--Subscriber 1000 added feature in Oct month.

insert into subscriber_feature values(1004, 'premium added', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');

--Subscriber 1004 added feature Nov month.

insert into subscriber_feature values(1000, 'premium removed', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');

--Subscriber 1000 removed feature nov month.

insert into subscriber_feature values(1004, 'premium added', current_timestamp - interval '2' month, current_timestamp ,'prep', 'cancel');

--Subscriber 1004 canceled his subscription today (Dec).

insert into subscriber_feature values(1005, 'premium added', current_timestamp, current_timestamp ,'prep', 'Active');

--New Subscriber 1005 added feature today (Dec).

Report by end of december

                   Oct-14    Nov-14   Dec-14

Oct -14        1             0             0

Nov-14        0              1             0

Dec-14        0              0             1

Total            1              1             1

Thanks

Enthusiast

Re: Complex analytical query help

I am using TD14

Junior Contributor

Re: Complex analytical query help

First you need the logic to determine the period when a subscription was active, something like this:

SELECT
SUB_ID
,feature_status
,sub_type_cd
,Sub_stat_cd
,TRUNC(feature_status_dttm, 'mon') AS start_dt -- 1st day of month
,TRUNC(MIN(CASE WHEN feature_status = 'premium removed' -- last day of the previous month before subscription ended
OR Sub_stat_cd = 'cancel'
THEN sub_status_change_dttm
END)
OVER (PARTITION BY SUB_ID
ORDER BY sub_status_change_dttm
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, 'mon') -1 AS end_dt
FROM subscriber_feature
QUALIFY feature_status = 'premium added' -- only start of subscription
AND Sub_stat_cd = 'Active'

Of course you might add some conditions to get only rows from the last 36 months...

Then you count the number of rows for each start/end combination:

SELECT
start_dt
,COALESCE(end_dt, DATE '9999-12-01') AS end_dt
,COUNT(*) AS CNT
FROM (previous query returning subscription start/end)
GROUP BY start_dt, end_dt

Finally you need to cross join it to a table with one row for each month:

SELECT
CASE WHEN GROUPING (mth) = 0 THEN (mth (FORMAT 'mmm-yy') (CHAR(6))) ELSE 'Total' END
,...
,SUM(CASE WHEN mth = start_dt AND DATE '2014-09-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Sep-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-10-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Oct-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-11-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Nov-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-12-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Dec-14"
FROM
( -- any query returning one row per 1st of month, e.g.
SELECT DISTINCT calendar_date AS mth
FROM sys_calendar.CALENDAR
WHERE calendar_DATE BETWEEN ADD_MONTHS(CURRENT_DATE,-36) AND CURRENT_DATE-1
AND day_of_month = 1
) AS cal
CROSS JOIN
(previous query returning counts per period)
GROUP BY GROUPING SETS ((mth), ())
ORDER BY mth NULLS LAST

 Hope I got your logic correct :-)

But there's no way to dynamically name the 36 month columns, you need to add/remove them manually as needed.

Enthusiast

Re: Complex analytical query help

Hi Dieter,

Thanks a lot, This is exactly what i was looking for and also learnt some new functions that i never came across in teradata. 

Thanks a lot,

Naga

Enthusiast

Re: Complex analytical query help

Hi dieter,

Some change in requirement.

Table structure has changed. reporting columns are static.

They are from Jan -15 to Dec - 17.

CREATE TABLE subscriber_feature

(SUB_ID INTEGER,  -- subscriber id

feature_code VARCHAR(20) -- feature code

feature_add_dttm DATE, --add dttm

feature_remove_dttm DATE,  -remove dttm

feature_cancel_dttm DATE,  -- cancel dttm

Sub_stat_cd CHAR(1))  - ex value 'a' - ACTIVE, 'c' - CANCEL  etc

PRIMARY INDEX(sub_id);

INSERT INTO  subscriber_feature VALUES(123,'Code','2015-01-15','2015-03-20', NULL,'A' );

INSERT INTO subscriber_feature VALUES(234,'Code','2015-01-20','2015-04-05', '2015-05-02','C');

INSERT INTO subscriber_feature VALUES(345,'Code','2015-02-05','2015-04-15',NULL,'A');

INSERT INTO  subscriber_feature VALUES(456,'Code','2015-01-25',NULL,2015-03-21,'C');

INSERT INTO  subscriber_feature VALUES(456, 'Code','2015-04-20','2015/05/20',NULL,'A' );

INSERT INTO subscriber_feature VALUES(567, 'Code','2015-02-10','2015-02-25',NULL,'A');

INSERT INTO  subscriber_feature VALUES(678, 'Code','2015-03-10',NULL,NULL,'A');

One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.

So count may go wrong for this scenario.

We are considering 6 million records. Can we introduce some table to store intermediate result in between to make the query performance better. If so, what format it should be?

Looking forward for your help.

Thanks

Enthusiast

Re: Complex analytical query help

Hi,

"One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.

So count may go wrong for this scenario."

Above is not a problem as we gre grouping on start date everything works fine. My bad. But as i said huge no of input is the problem. Experts want  to break this process in to two with intermediate table.

thanks.

Junior Contributor

Re: Complex analytical query help

What's the performance of the current query? If it's good enough there's no need to use additional tables (besides maybe one permanent table with 36 rows for each 1st of month instead of a query on the calendar)

Can you show the explain?