Propagate row data until next date is encountered (ii)

Database
Enthusiast

Propagate row data until next date is encountered (ii)

I read over this posting: http://community.teradata.com/t5/Database/Propagate-row-data-until-next-date-is-encountered/m-p/3921...

 

I believe I had a similar question, but in my case, I'd like to add rows earlier than the first record. 

 

For example, the volatile table created by the attached program has 21 records. I'd like to create a table that has 24 rows, 6 for each person, with a row for each of the people for Jan 2006, Feb 2006, Mar 2006, Apr 2006, May 2006, and Jun 2006. Right now, the SQL statement with EXPAND ON is producing one with 23, missing the Jan 2006 row for person 'C'.  Would like very much to solve this with continued use of EXPAND ON, if possible.  Many thanks, Brent Gunderson

 

CREATE MULTISET VOLATILE TABLE ts, 
    NO FALLBACK , 
    NO BEFORE JOURNAL, 
    NO AFTER JOURNAL, 
    CHECKSUM = DEFAULT, 
    DEFAULT MERGEBLOCKRATIO 
        ( 
            Person CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,       
            "month" DATE FORMAT 'YY/MM/DD'
        ) 
    ON COMMIT PRESERVE ROWS; 
INSERT INTO ts('A','2006-01-01'); 
INSERT INTO ts('A','2006-02-01'); 
INSERT INTO ts('A','2006-03-01'); 
INSERT INTO ts('A','2006-04-01'); 
INSERT INTO ts('A','2006-05-01'); 
INSERT INTO ts('A','2006-06-01'); 
INSERT INTO ts('B','2006-01-01'); 
INSERT INTO ts('B','2006-03-01'); 
INSERT INTO ts('B','2006-04-01'); 
INSERT INTO ts('B','2006-05-01'); 
INSERT INTO ts('B','2006-06-01'); 
INSERT INTO ts('C','2006-02-01'); 
INSERT INTO ts('C','2006-03-01'); 
INSERT INTO ts('C','2006-04-01'); 
INSERT INTO ts('C','2006-05-01'); 
INSERT INTO ts('C','2006-06-01'); 
INSERT INTO ts('D','2006-01-01'); 
INSERT INTO ts('D','2006-02-01'); 
INSERT INTO ts('D','2006-03-01'); 
INSERT INTO ts('D','2006-04-01'); 
INSERT INTO ts('D','2006-05-01'); 




SELECT B.*, MONTH3, BEGIN(MONTH3) AS "MONTH4"
FROM
    ( 
        SELECT
        A.*
        , PERIOD("MONTH", COALESCE(MIN("MONTH")
                          OVER (PARTITION BY PERSON
                          ORDER BY "MONTH" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING), CAST ('2006-06-01' AS DATE)+1)) AS MONTH2
      FROM TS A
    ) AS B
EXPAND ON MONTH2 AS MONTH3 BY INTERVAL '1' MONTH 
ORDER BY 1,2;

 


Accepted Solutions
Apprentice

Re: Propagate row data until next date is encountered (ii)

Hi Brent,

 

So if you want "but I need the DATA_VALUE column to be either 0 or null for those cases where the month was blank in the underlying table (i.e., Feb for person B, Jan for person C and June for person D)" then surely this is even simpler? How about:

SELECT ts_range.person
   ,ts_range."month"
   ,a.datavalue
FROM ts AS a
RIGHT OUTER JOIN (SELECT person  ,"month"
                  FROM (SELECT "month" FROM ts GROUP BY 1) AS dt1
                  CROSS JOIN (SELECT person FROM ts GROUP BY 1) AS dt2) AS ts_range
ON a.person = ts_range.person
  AND ts_range."month" = a."month"
ORDER BY 1,2,3;

This gives you 24 rows, one for each person/month combination with (in my case) NULLs as appropriate.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
1 ACCEPTED SOLUTION
8 REPLIES
Apprentice

Re: Propagate row data until next date is encountered (ii)

Hi Brent,

 

I have to say 'up front' that I do not have a lot of real experience of the 'expand on' feature, so you or someone else may correctly dispute what I put here. Having given my 'cop out' clause...

 

I know that you said "preferably using EXPAND ON" but I think there is a fundamental problem using that feature with your data.

 

AFAIK unless you are using 'anchor period expansion' then 'expand on' will not create rows outside of the original date range. Your data for person 'C' does not have any data in January 2006, so 'interval expansion' will not create that date range for you.

 

Having said that I think I might have an answer for you.

 

Firstly, what exactly are you trying to do. In your post you say "I'd like to create a table that has 24 rows, 6 for each person, with a row for each of the people for Jan 2006, Feb 2006, Mar 2006, Apr 2006, May 2006, and Jun 2006."

 

If what you want is a table listing each combination of person and month then the solution is (I think) very simple. Try the following:

SELECT person  ,"month"
FROM (SELECT "month" FROM ts GROUP BY 1) AS dt1
CROSS JOIN (SELECT person FROM ts GROUP BY 1) AS dt2;

Using your data, this gives a table of 24 rows, one for each combination of Person and Month.

 

However, assuming that you have other columns in the table that need to go into those final result rows I came up with the following. Firstly, I put an extra column into table 'ts', called it 'datavalue' and gave each row a unique value - simply so I could see where my result rows came from. but this also shows that other columns in your source data would get replicated into 'created' rows.

 

SELECT a.*
   ,ts_range."month"
   ,(CASE
     WHEN a."month" = ts_range."month" THEN 1
	 WHEN ADD_MONTHS(a."month",-1) = ts_range."month" THEN 2
	 WHEN ADD_MONTHS(a."month",1) = ts_range."month" THEN 3
	 ELSE 0
	 end) AS month_diff
FROM ts AS a
CROSS JOIN (SELECT person  ,"month"
            FROM (SELECT "month" FROM ts GROUP BY 1) AS dt1
            CROSS JOIN (SELECT person FROM ts GROUP BY 1) AS dt2) AS ts_range
WHERE a.person = ts_range.person
  AND month_diff <> 0
QUALIFY month_diff =  MIN(month_diff) OVER(PARTITION BY a.person, ts_range."month")
ORDER BY 1,2,4;

This gives me a result set of 24 rows, 6 for each person, 4 for each 'month'. With the addition of my 'datavalue' column my result set looks like this (the sequential numbers on the very left of each row is from my SQLA result set window - i.e. row numbers, not part of the data):

	Person	month	datavalue	month	month_diff
1	A	01/01/2006	1	01/01/2006	1
2	A	01/02/2006	2	01/02/2006	1
3	A	01/03/2006	3	01/03/2006	1
4	A	01/04/2006	4	01/04/2006	1
5	A	01/05/2006	5	01/05/2006	1
6	A	01/06/2006	6	01/06/2006	1
7	B	01/01/2006	7	01/01/2006	1
8	B	01/03/2006	8	01/02/2006	2
9	B	01/03/2006	8	01/03/2006	1
10	B	01/04/2006	9	01/04/2006	1
11	B	01/05/2006	10	01/05/2006	1
12	B	01/06/2006	11	01/06/2006	1
13	C	01/02/2006	12	01/01/2006	2
14	C	01/02/2006	12	01/02/2006	1
15	C	01/03/2006	13	01/03/2006	1
16	C	01/04/2006	14	01/04/2006	1
17	C	01/05/2006	15	01/05/2006	1
18	C	01/06/2006	16	01/06/2006	1
19	D	01/01/2006	17	01/01/2006	1
20	D	01/02/2006	18	01/02/2006	1
21	D	01/03/2006	19	01/03/2006	1
22	D	01/04/2006	20	01/04/2006	1
23	D	01/05/2006	21	01/05/2006	1
24	D	01/05/2006	21	01/06/2006	3

(and the dates are UK date format Smiley Happy )

 

How does that look?

 

Cheers, Dave

 

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Junior Contributor

Re: Propagate row data until next date is encountered (ii)

Hi Dave,

if the range of months is fixed the person/month combination can be created using

 

SELECT Person, Begin(pd)
FROM
 (
   SELECT DISTINCT Person FROM ts
 ) AS ts
EXPAND ON PERIOD(DATE '2006-01-01',DATE '2006-06-01'+1) AS pd   BY INTERVAL '1' MONTH 

 

 

 

And the OP's query was quite close, just the first month(s) might be missing. Adding another OLAP can close that gap:

 

SELECT B.*, Begin(MONTH3) AS "MONTH4"
FROM
    ( 
       SELECT
          A.*
        ,PERIOD(CASE WHEN Row_Number() -- set the first row to '2006-01-01' if it's later 
                          Over (PARTITION BY PERSON
                                ORDER BY "MONTH") = 1 
                      AND "MONTH" <> DATE '2006-01-01' 
                     THEN DATE '2006-01-01' 
                     ELSE "MONTH"
                END
               ,Coalesce(Min("MONTH")
                         Over (PARTITION BY PERSON
                               ORDER BY "MONTH"
                               ROWS BETWEEN 1 Following AND 1 Following)
                        ,  DATE '2006-06-01' +1)
               ) AS "MONTH2"
      FROM TS A
    ) AS B

EXPAND ON MONTH2 AS MONTH3 BY INTERVAL '1' MONTH FOR PERIOD (DATE '2006-01-01',DATE '2006-06-01'+1)
ORDER BY 1,2;      

As both OLAP step share the same Partition/Order it will be a single Explain step.

Btw, row 8 will return a different Datavalue, your query 8, mine 7.

 

Brent should add some more details, what he's actually trying to do :-)

 

 

 

Enthusiast

Re: Propagate row data until next date is encountered (ii)

Hi Dave and Dieter,

 

Many thanks for the reponses, much appreciated. I see I could have been more clear about what I wanted; I'll try again. Also, Dave is right on, there are other columns in my 'real' data that will need to be carried.

 

These statements (DATA_VALUE column is just some arbitrary dummy data points):

CREATE MULTISET VOLATILE TABLE TS,
    NO FALLBACK ,
    NO BEFORE JOURNAL,
    NO AFTER JOURNAL,
    CHECKSUM = DEFAULT,
    DEFAULT MERGEBLOCKRATIO
        (
            PERSON CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
            "MONTH" DATE FORMAT 'YY/MM/DD',
            DATA_VALUE INT
        )
    ON COMMIT PRESERVE ROWS;
INSERT INTO TS('A','2006-01-01',1);
INSERT INTO TS('A','2006-02-01',4);
INSERT INTO TS('A','2006-03-01',3);
INSERT INTO TS('A','2006-04-01',2);
INSERT INTO TS('A','2006-05-01',1);
INSERT INTO TS('A','2006-06-01',2);
INSERT INTO TS('B','2006-01-01',5);
INSERT INTO TS('B','2006-03-01',7);
INSERT INTO TS('B','2006-04-01',9);
INSERT INTO TS('B','2006-05-01',4);
INSERT INTO TS('B','2006-06-01',3);
INSERT INTO TS('C','2006-02-01',1);
INSERT INTO TS('C','2006-03-01',7);
INSERT INTO TS('C','2006-04-01',2);
INSERT INTO TS('C','2006-05-01',3);
INSERT INTO TS('C','2006-06-01',4);
INSERT INTO TS('D','2006-01-01',9);
INSERT INTO TS('D','2006-02-01',6);
INSERT INTO TS('D','2006-03-01',4);
INSERT INTO TS('D','2006-04-01',2);
INSERT INTO TS('D','2006-05-01',5);

SELECT B.*, BEGIN(MONTH3) AS "MONTH4"
FROM
    ( 
       SELECT
          A.*
        ,PERIOD(CASE WHEN ROW_NUMBER() -- SET THE FIRST ROW TO '2006-01-01' IF IT'S LATER 
                          OVER (PARTITION BY PERSON
                                ORDER BY "MONTH") = 1 
                      AND "MONTH" <> DATE '2006-01-01' 
                     THEN DATE '2006-01-01' 
                     ELSE "MONTH"
                END
               ,COALESCE(MIN("MONTH")
                         OVER (PARTITION BY PERSON
                               ORDER BY "MONTH"
                               ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
                        ,  DATE '2006-06-01' +1)
               ) AS "MONTH2"
      FROM TS A
    ) AS B

EXPAND ON MONTH2 AS MONTH3 BY INTERVAL '1' MONTH FOR PERIOD (DATE '2006-01-01',DATE '2006-06-01'+1)
ORDER BY 1,2;  

Make this table, which is about what I want --- the PERSON and MONTH4 columns are what I am looking for, but I need the DATA_VALUE column to be either 0 or null for those cases where the month was blank in the underlying table (i.e., Feb for person B, Jan for person C and June for person D). I tried to do  CASE WHEN "MONTH" NE "MONTH4" THEN 0 ELSE DATA_VALUE END AS DV2 .... but I get an error about invalid use of expanded column name.

 PERSON MONTH      DATA_VALUE MONTH2                   MONTH4     
 ------ ---------- ---------- ------------------------ ---------- 
 A      2006-01-01          1 (2006-01-01, 2006-02-01) 2006-01-01
 A      2006-02-01          4 (2006-02-01, 2006-03-01) 2006-02-01
 A      2006-03-01          3 (2006-03-01, 2006-04-01) 2006-03-01
 A      2006-04-01          2 (2006-04-01, 2006-05-01) 2006-04-01
 A      2006-05-01          1 (2006-05-01, 2006-06-01) 2006-05-01
 A      2006-06-01          2 (2006-06-01, 2006-06-02) 2006-06-01
 B      2006-01-01          5 (2006-01-01, 2006-03-01) 2006-01-01
 B      2006-01-01          5 (2006-01-01, 2006-03-01) 2006-02-01
 B      2006-03-01          7 (2006-03-01, 2006-04-01) 2006-03-01
 B      2006-04-01          9 (2006-04-01, 2006-05-01) 2006-04-01
 B      2006-05-01          4 (2006-05-01, 2006-06-01) 2006-05-01
 B      2006-06-01          3 (2006-06-01, 2006-06-02) 2006-06-01
 C      2006-02-01          1 (2006-01-01, 2006-03-01) 2006-01-01
 C      2006-02-01          1 (2006-01-01, 2006-03-01) 2006-02-01
 C      2006-03-01          7 (2006-03-01, 2006-04-01) 2006-03-01
 C      2006-04-01          2 (2006-04-01, 2006-05-01) 2006-04-01
 C      2006-05-01          3 (2006-05-01, 2006-06-01) 2006-05-01
 C      2006-06-01          4 (2006-06-01, 2006-06-02) 2006-06-01
 D      2006-01-01          9 (2006-01-01, 2006-02-01) 2006-01-01
 D      2006-02-01          6 (2006-02-01, 2006-03-01) 2006-02-01
 D      2006-03-01          4 (2006-03-01, 2006-04-01) 2006-03-01
 D      2006-04-01          2 (2006-04-01, 2006-05-01) 2006-04-01
 D      2006-05-01          5 (2006-05-01, 2006-06-02) 2006-05-01
 D      2006-05-01          5 (2006-05-01, 2006-06-02) 2006-06-01

Thanks again for the help, Brent

Apprentice

Re: Propagate row data until next date is encountered (ii)

Hi Brent,

 

So if you want "but I need the DATA_VALUE column to be either 0 or null for those cases where the month was blank in the underlying table (i.e., Feb for person B, Jan for person C and June for person D)" then surely this is even simpler? How about:

SELECT ts_range.person
   ,ts_range."month"
   ,a.datavalue
FROM ts AS a
RIGHT OUTER JOIN (SELECT person  ,"month"
                  FROM (SELECT "month" FROM ts GROUP BY 1) AS dt1
                  CROSS JOIN (SELECT person FROM ts GROUP BY 1) AS dt2) AS ts_range
ON a.person = ts_range.person
  AND ts_range."month" = a."month"
ORDER BY 1,2,3;

This gives you 24 rows, one for each person/month combination with (in my case) NULLs as appropriate.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Apprentice

Re: Propagate row data until next date is encountered (ii)

Actually, when do you NULL and when do you want 0?

 

That might affect the above (it will only give you NULLs for missing data).

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Propagate row data until next date is encountered (ii)

Hi Dave --- I am sure it should matter, but I'm not certain it really does for my purposes. I think that if the data were missing in the original, then it should be null. Thanks, Brent

Junior Contributor

Re: Propagate row data until next date is encountered (ii)

Of course you could use EXPAND ON, but this is mainly for filling missing rows with the previous value.

 

Dave's 1st Select returns a matrix of all months per Persons, usually done with a Cross Join, but for a known range of values EXPAND ON is simpler. Then simply outer join your existing data to it:

SELECT dt.Person, dt.mth, TS.data_value
FROM
 ( -- create a matrix of 6 months per Persons
   SELECT Person, Begin(pd) AS mth
   FROM
    ( -- you might use the Primary Key of an existing "Person" table instead
      SELECT DISTINCT Person FROM ts
    ) AS ts                                                 -- or BY ANCHOR MONTH_BEGIN
   EXPAND ON PERIOD(DATE '2006-01-01', DATE '2006-06-01'+1) AS pd BY INTERVAL '1' MONTH 
 ) AS dt
LEFT JOIN TS
  ON dt.Person = TS.Person
 AND dt.mth = "month"
ORDER BY 1,2

 

Edit:

Of course this is quite the same as the answer Dave's posted an hour before, didn't spot it :-)

Enthusiast

Re: Propagate row data until next date is encountered (ii)

Thanks for all the help. Several of the solutions given work for me. Brent