I have a table with information like subscriber added/removed particular feature along with the date it added or removed. Also if he is active or canceled his subscription.
CREATE TABLE subscriber_feature
(SUB_ID INTEGER, -- subscriber id
feature_status VARCHAR(20) -- feature status change ex - 'preminum added' / 'preminum removed'
feature_status_dttm TIMESTAMP(6), --feature added/removed dttm
sub_status_change_dttm TIMESTAMP(6), -sub status change dttm
sub_type_cd CHAR(5), -- ex value 'prep'
Sub_stat_cd CHAR(1)) - ex value 'active', 'cancel' etc
I have to write SQL to give report in following format.
Jan-12 Feb-12 Mar-12 .....Dec-14
Jan-12 100 80 60 .....
Feb-12 0 100 80 ....
Mar-12 0 0 100
premium- 100 180 240 ............
First row is all Jan-12 data. for Jan-12, how many subscribed for the feature, then how many stays in it by every month end.
(Ex) 100 subscribed Jan-12 and 15 unsubscribed for premium feature and 5 canceled there subscription by feb-12. So first row should have 100 for Jan-12 and 80(100 - 15 - 5) for Feb-12 and so on for all months.
Second row is for Feb-12. Same like jan. Last row should have the total count of subscribers subscribed for preminum feature by all month end.
We need to calculate feature added and feature removed using both feature_status and feature_status_dttm. subscription cancel by Sub_stat_cd & sub_status_change_dttm.
Looks very very complex to me. Could some one help me with this? If we can store the result in volatile table and derive the report using another query also fine. Please help.
Can you elaborate on the rules how feature_status/feature_status_dttm and Sub_stat_cd/sub_status_change_dttm are related and add some sample inserts?
What's you TD release?
I would assume some combination of EXPAND ON/TD_SEQUENCED_COUNT plus GROUPING SETS will be able to return this result.
If feature added or removed both will be stored in feature_status. For both the action the event time is stored in Feature_status_dttm. Using the feature_status and Feature_status_dttm only i can determine when feature is added/removed.
If sub_stat_cd is 'cancel' then subscription cancel time will be stored in sub_status_change_dttm. Using this i can determine this subscriber is still with us or not.
Based on both i can determine how many subscribed for particular feature in month 1 and how many still using this feature for 3 years month by month.
Sample inserts below
subscriber_feature values(1000, 'premium added', current_timestamp - interval '2' month, current_timestamp - interval '90' day,'prep', 'active');
--Subscriber 1000 added feature in Oct month.
subscriber_feature values(1004, 'premium added', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');
--Subscriber 1004 added feature Nov month.
subscriber_feature values(1000, 'premium removed', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');
--Subscriber 1000 removed feature nov month.
subscriber_feature values(1004, 'premium added',
current_timestamp - interval '2' month, current_timestamp ,'prep', 'cancel');
--Subscriber 1004 canceled his subscription today (Dec).
subscriber_feature values(1005, 'premium added',
current_timestamp, current_timestamp ,'prep', 'Active');
--New Subscriber 1005 added feature today (Dec).
Report by end of december
Oct-14 Nov-14 Dec-14
Oct -14 1 0 0
Nov-14 0 1 0
Dec-14 0 0 1
Total 1 1 1
First you need the logic to determine the period when a subscription was active, something like this:
,TRUNC(feature_status_dttm, 'mon') AS start_dt -- 1st day of month
,TRUNC(MIN(CASE WHEN feature_status = 'premium removed' -- last day of the previous month before subscription ended
OR Sub_stat_cd = 'cancel'
OVER (PARTITION BY SUB_ID
ORDER BY sub_status_change_dttm
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, 'mon') -1 AS end_dt
QUALIFY feature_status = 'premium added' -- only start of subscription
AND Sub_stat_cd = 'Active'
Of course you might add some conditions to get only rows from the last 36 months...
Then you count the number of rows for each start/end combination:
,COALESCE(end_dt, DATE '9999-12-01') AS end_dt
,COUNT(*) AS CNT
FROM (previous query returning subscription start/end)
GROUP BY start_dt, end_dt
Finally you need to cross join it to a table with one row for each month:
CASE WHEN GROUPING (mth) = 0 THEN (mth (FORMAT 'mmm-yy') (CHAR(6))) ELSE 'Total' END
,SUM(CASE WHEN mth = start_dt AND DATE '2014-09-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Sep-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-10-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Oct-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-11-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Nov-14"
,SUM(CASE WHEN mth = start_dt AND DATE '2014-12-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Dec-14"
( -- any query returning one row per 1st of month, e.g.
SELECT DISTINCT calendar_date AS mth
WHERE calendar_DATE BETWEEN ADD_MONTHS(CURRENT_DATE,-36) AND CURRENT_DATE-1
AND day_of_month = 1
) AS cal
(previous query returning counts per period)
GROUP BY GROUPING SETS ((mth), ())
ORDER BY mth NULLS LAST
Hope I got your logic correct :-)
But there's no way to dynamically name the 36 month columns, you need to add/remove them manually as needed.
Thanks a lot, This is exactly what i was looking for and also learnt some new functions that i never came across in teradata.
Thanks a lot,
Some change in requirement.
Table structure has changed. reporting columns are static.
They are from Jan -15 to Dec - 17.
CREATE TABLE subscriber_feature
(SUB_ID INTEGER, -- subscriber id
feature_code VARCHAR(20) -- feature code
feature_add_dttm DATE, --add dttm
feature_remove_dttm DATE, -remove dttm
feature_cancel_dttm DATE, -- cancel dttm
Sub_stat_cd CHAR(1)) - ex value 'a' - ACTIVE, 'c' - CANCEL etc
INSERT INTO subscriber_feature VALUES(123,'Code','2015-01-15','2015-03-20', NULL,'A' );
INSERT INTO subscriber_feature VALUES(234,'Code','2015-01-20','2015-04-05', '2015-05-02','C');
INSERT INTO subscriber_feature VALUES(345,'Code','2015-02-05','2015-04-15',NULL,'A');
INSERT INTO subscriber_feature VALUES(456,'Code','2015-01-25',NULL,2015-03-21,'C');
INSERT INTO subscriber_feature VALUES(456, 'Code','2015-04-20','2015/05/20',NULL,'A' );
INSERT INTO subscriber_feature VALUES(567, 'Code','2015-02-10','2015-02-25',NULL,'A');
INSERT INTO subscriber_feature VALUES(678, 'Code','2015-03-10',NULL,NULL,'A');
One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.
So count may go wrong for this scenario.
We are considering 6 million records. Can we introduce some table to store intermediate result in between to make the query performance better. If so, what format it should be?
Looking forward for your help.
"One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.
So count may go wrong for this scenario."
Above is not a problem as we gre grouping on start date everything works fine. My bad. But as i said huge no of input is the problem. Experts want to break this process in to two with intermediate table.
What's the performance of the current query? If it's good enough there's no need to use additional tables (besides maybe one permanent table with 36 rows for each 1st of month instead of a query on the calendar)
Can you show the explain?