SQL assistance--find mismatches in daily data

Database
Fan

SQL assistance--find mismatches in daily data

I'm looking for a way to quantify a problem for my developers. 

I have 3 columns:

CUST_ID  ACTION_DATE  DATE

The action date is when an event happens while the date is snapshot of when in time the data was valid.

THe problem I am having is my ACTION_DATE may not go from NULL to a date until a DATE much later than the actual DATE.

What it should look like:

CUST_ID  ACTION_DATE  DATE

1              NULL              4-1

1             4-2                 4-2

1             4-2                 4-3

1             4-2                 4-4

1             4-2                  4-5

What it really looks like (aka "bad behavior")

CUST_ID  ACTION_DATE  DATE

1              NULL                4-1

1             NULL                 4-2

1             NULL                 4-3

1             4-2                    4-4

1             4-2                    4-5

The question is how do I select CUST_ID that display the "bad" behavior--its not always 2 days--it can be 5-45 days--it is not systematic.

I have greatly simplified this table, and as you can imagine each of our millions of records has an entry for each day--not exactly 3NF.  Normally I could play around with some self-joins but these tables cause trouble with our Teradata loading and jobs will get killed by our admins as well as nasty emails sent about system overuse.  ANy ideas on selecting CUST_IDs where the ACTION_DATE at a later point in time was NULL on the DATE where the ACTION_DATE occured.

I do have a similar tables:  A 'current' table that is just all records with DATE=Today.  My original attempts were to join the  historical onto current table--but i did not have much success, and again, because of table size, i had to severely restrict what i was joining--i could not make a general case--I can only make it work for a specific day.

Tags (1)
5 REPLIES
Supporter

Re: SQL assistance--find mismatches in daily data

I am not sure that I fully understand your model and question. From what I got I would try

select *
from table
qualify min(ACTION_DATE) over (partition by cust_id) <= DATE_COL
and ACTION_DATE is null
;

   

P.S. Date is a reserved word and usully is the system date - so do not use date as columname...

Fan

Re: SQL assistance--find mismatches in daily data

Good point, I had changed my column names as this is a work project, and I don't want to use the "real" names of the columns.  DATE should be HIST_DT.

I have not used qualify and partition and had difficulty understanding exactly what they do. Also we are tyring to cool off our system right now as it is under tremendous load--I better figure out what the query does before I hit submit!

  Any recommendations on resrouces to learn about those keywords/functions?

In the meantime, here the query that works for a single day (Feb 1, 2012):

select count(1) from ACT_CURRENT AC

JOIN ACT_HIST AH on

AC.CUST_ID=AH.CUST_ID and

AH.HIST_DT=date '2012-02-01'

where AC.ACTION_DT = date '2012-02-01' and AH.ACTION_DT is NULL

 My difficulty is generaling so that I can run for all dates.  The other portion that may be confusing is not every CUST_ID gets an ACTION_DT--only about 10% do--but out of those 10% that do, some are not posting an ACTION_DT on the appropriate HIST_DT.  The trick is if it does get one, it should be updated on the HIST_DT that the ACTION_DT occurs, instead I have have multiple HIST_DT records that continue to be null until a random time in the future.

We could think of ACTION_DT as a flag--i think of it as almost a FOR EACH statement:

For every CUST_ID ROW that contains an ACTION_DT in the ACT_CURRENT table,

Go to the ACT_HISTORY table record for that HIST_DT and lookup the ACTION_DT.

If ACTION_DT is null, select that row and return the CUST_ID (or count it etc...)

Senior Apprentice

Re: SQL assistance--find mismatches in daily data

Still hard to understand :-)

Your narration seems to translate to adding another join condition instead of the hard-coded date:

AC.ACTION_DT = AH.HIST_DT

Instead of a join it might be better with an EXISTS (if there are multiple rows with the same HIST_DT)

select *
from ACT_CURRENT AC
where exists
(select * from ACT_HIST AH
where AC.CUST_ID=AH.CUST_ID
and AC.ACTION_DT=AH.HIST_DT
and AH.ACTION_DT is NULL)

Btw, why is in your "What it should look like" example still a NULL for 4-1?

Because it's the "first" row?

Dieter

Fan

Re: SQL assistance--find mismatches in daily data

The action (or event) happened on 4-2 (April 2), thus for the HIST_DT of 4-2 onwards the event ACTION_DT should be '4-2'.  What I've found is that usually the ACTION_DT is NULL for a much longer time.

Enthusiast

Re: SQL assistance--find mismatches in daily data

Hi dnoeth,

Can you pls give the query for finding the space occupied by Indexes only.

Thanks,

Praveen