Here's something I've been struggling with. Let's say we have a type 2 SCD named table1 with row_start_date and row_end_date fields, both dates:
ROW_START_DATE | ROW_END_DATE | KEY
1/1/2015 | 3/1/2015 | 123
3/2/2015 | 9/1/2016 | 123
1/10/2015 | 1/1/9999 | 456
A standard query might be something like
select * from table1 where '5/1/2015' between ROW_START_DATE and ROW_END_DATE
(please excuse syntax issues as they aren't germane to my question).
My question is: How can I partition this table to maximize speed, knowing that any date can be entered, and the rows that meet the resulting BETWEEN could have any value less than the date for the row_start_date and any value greater than the date in the row_end_date?
Can I specify multiple values at one level of partition hierarchy (as opposed to the nested levels I'm already aware of and don't solve this problem)?
Any help is appreciated!
No 'partitioning' solution quickly comes to mind.
You might need to use an 'old school' approach for this, two NUSI's, one on ROW_START_DATE and one on ROW_END_DATE.
The TD optimiser is smart enough (usually ) to combine the use of multiple indexes in a single query. The logic that is employed is something like:
For this approach you will need stats on each of the selection columns.
Had a thought yesterday and a quick explain shows that you might be able to use partitioning.
I tried the following table definition and query on a TD 15.10 system:
CREATE SET VOLATILE TABLE t1 ( col1 INTEGER ,row_start_date DATE ,row_end_date DATE ) PRIMARY INDEX(col1) PARTITION BY RANGE_N(row_start_date BETWEEN DATE '2010-01-01' AND DATE '2017-12-31' EACH INTERVAL '1' DAY); SELECT * FROM t1 WHERE DATE '2015-05-01' BETWEEN ROW_START_DATE AND ROW_END_DATE;
When I explain this query it shows that the PPI is being used.
Explain SELECT * FROM t1 WHERE DATE '2015-05-01' BETWEEN ROW_START_DATE AND ROW_END_DATE; 1) First, we do an all-AMPs RETRIEVE step from 1947 partitions of STUDENT101.t1 with a condition of ("(STUDENT101.t1.row_start_date <= DATE '2015-05-01') AND (STUDENT101.t1.row_end_date >= DATE '2015-05-01')") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (45 bytes). The estimated time for this step is 0.04 seconds. 2) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.04 seconds.
This might help with what you want to do.