Creating Missing Observations

General

Creating Missing Observations

Given the data below, I need to create an observation for 7/5/2014.  I also need to retain the values from the previous record (record of Period_date=7/4/2014) for this exercise.  I am using the table SYS_CALENDAR.CALENDAR and I have tried both a CROSS and FULL JOINs but was unsuccessful with obtaining the output desired.  Additional requirements is to track the KEY variable from the START date to the END date, if there is any missing days these days needed to be created with the preceding PERIOD_DATE’s information/values.  Teradata version 14 is what is currently supported by our company.

Current Table Output:

Key        Eff_Date              Start                      End                        Due_Date            Period_date

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/2/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/3/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/4/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/6/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/7/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/8/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/9/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/10/2014

Current Desired Output:

Key        Eff_Date              Start                      End                        Due_Date            Period_date

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/2/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/3/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/4/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/5/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/6/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/7/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/8/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/9/2014

A             7/1/2014              7/2/2014              7/31/2014            7/1/2014              7/10/2014

Tags (2)
4 REPLIES
Junior Contributor

Re: Creating Missing Observations

Looks like a perfect case for EXPAND ON, the missing rows get the last existing row's data.

The needed PERIOD can be created on-the-fly:

SELECT Key
Eff_Date,
Start,
End,
Due_Date,
BEGIN(pd2) as Period_date
FROM
(
SELECT
tab.*,
PERIOD(period_date, COALESCE(MIN(period_date)
OVER (PARTITION BY Key
ORDER BY period_date
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING), period_date + 1)) AS pd
FROM tab
) AS dt
EXPAND ON pd AS pd2

If you need to create rows after 7/10/2014 change period_date + 1 to end + 1

Re: Creating Missing Observations

Dieter, this worked for a select few Key (accounts), however, when submitting against the entire population the query (slightly adjusted) returns the following error:

 SELECT Failed. 9103:  Invalid Period value constructor. The beginning bound must be less than the ending bound. 

Google search provided some insight, however, I am lacking the proper experience to understand how the first observation could be greater than the ending bound seeing that I partitioned the data by loan_num and clnt_num then ordered by eff_dt (eff_dt should always be in ascending order to the partition variables).  I attached the modified query (which works on a few accounts), can you please review and provide solution/suggestions to my programming error?

select

base.clnt_num

, base.loan_num

, base.next_due_dt

, base.eff_dt

, base.start_date

, base.end_date

, base.period_date_due_date

, begin(pd2) as period_date

from (

select

base.clnt_num

, base.loan_num

, base.next_due_dt

, base.eff_dt

, base.start_date

, base.end_date

, cse.next_due_dt as period_date_due_date

, period(cse.eff_dt,coalesce(min(cse.eff_dt) over (partition by cse.loan_num, cse.clnt_num order by cse.eff_dt rows between 1 following and 1 following),cse.eff_dt+1)) as pd

from eiw_i_mide_cse_v0.cse_loan as cse

join start_cycle as base on base.loan_num=cse.loan_num and base.clnt_num=cse.clnt_num

and cse.rec_eff_dt between base.start_date and base.end_date

where cse.eff_dt between '2014-07-01' and '2014-08-31'

) as base

order by

base.clnt_num

, base.loan_num

, base.next_due_dt

, pd2

expand on pd as pd2

Junior Contributor

Re: Creating Missing Observations

It's hard to tell without knowing the data.

You might not order by the correct column, based on your example eff_dt was the same for all rows, but period_date was unique.

Or there are multiple rows with the same end_date resulting in PERIOD(end_date, end_date).

Both cases will result in invalid periods, a period's end_date must be greater than it's start date, it's defined as  >= start_date and < end_date.

.

Re: Creating Missing Observations

This helped solve the problem Dieter!  Your advice has been greatly appreciated.

Come to find out, there were different observations for the variable 'next_due_dt' for the same loan_num and clnt_num which was causing the problem.  From my understanding, this would have to be accounted for in the PARTITON clause.

 CLNT_NUM LOAN_NUM EFF_DT      NEXT_DUE_DT  previous_due_date    start_date   end_date

1                 AAAA          7/18/2014  7/18/2014       7/18/2014                   7/19/2014   8/17/2014

1                 AAAA          7/4/2014    7/4/2014         7/4/2014                     7/5/2014     8/3/2014

Adding to the PARTITON:

, period(cse.eff_dt,

coalesce(min(cse.eff_dt) over (partition by cse.loan_num, cse.clnt_num, base.next_due_dt order by cse.eff_dt rows between 1 following and 1 following),cse.eff_dt+1)) as pd

Solutioned the problem!