What's the best way to Partition for BETWEEN?

Database
Highlighted
Enthusiast

What's the best way to Partition for BETWEEN?

Hello everyone!

 

Here's something I've been struggling with. Let's say we have a type 2 SCD named table1 with row_start_date and row_end_date fields, both dates:

 

ROW_START_DATE   | ROW_END_DATE   |   KEY

-------------------------------------------

1/1/2015         | 3/1/2015       |   123

3/2/2015         | 9/1/2016       |   123

1/10/2015        | 1/1/9999       |   456

 

A standard query might be something like

 

select * from table1 where '5/1/2015' between ROW_START_DATE and ROW_END_DATE

(please excuse syntax issues as they aren't germane to my question). 

 

My question is: How can I partition this table to maximize speed, knowing that any date can be entered, and the rows that meet the resulting BETWEEN could have any value less than the date for the row_start_date and any value greater than the date in the row_end_date?

 

Can I specify multiple values at one level of partition hierarchy (as opposed to the nested levels I'm already aware of and don't solve this problem)? 

 

Any help is appreciated! 

 

Joe

 

 

 

4 REPLIES
Senior Apprentice

Re: What's the best way to Partition for BETWEEN?

Hi Joe,

 

No 'partitioning' solution quickly comes to mind.

 

You might need to use an 'old school' approach for this, two NUSI's, one on ROW_START_DATE and one on ROW_END_DATE.

 

The TD optimiser is smart enough (usually Smiley Happy ) to combine the use of multiple indexes in a single query. The logic that is employed is something like:

  • find all rows that meet the ROW_START_DATE criteria and put rowid's into a spool file
  • use the second index (ROW_START_DATE) to find rows that meet both criteria
  • access the base row using the internal rowid

 For this approach you will need stats on each of the selection columns.

 

HTH

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: What's the best way to Partition for BETWEEN?

Hello Dave!

 

Thanks for your help with this! Greatly appreciated! Glad to see I'm not crazy. :)

 

Joe

Senior Apprentice

Re: What's the best way to Partition for BETWEEN?

Hi Joe,

 

Had a thought yesterday and a quick explain shows that you might be able to use partitioning.

 

I tried the following table definition and query on a TD 15.10 system:

CREATE SET VOLATILE TABLE t1
(
col1 INTEGER
,row_start_date DATE
,row_end_date DATE
)
PRIMARY INDEX(col1)
PARTITION BY RANGE_N(row_start_date BETWEEN DATE '2010-01-01' AND DATE '2017-12-31' EACH INTERVAL '1' DAY);

SELECT * FROM t1 WHERE DATE '2015-05-01' BETWEEN ROW_START_DATE AND ROW_END_DATE;

When I explain this query it shows that the PPI is being used.

Explain SELECT * FROM t1 WHERE DATE '2015-05-01' BETWEEN ROW_START_DATE AND ROW_END_DATE;

  1) First, we do an all-AMPs RETRIEVE step from 1947 partitions of
     STUDENT101.t1 with a condition of ("(STUDENT101.t1.row_start_date
     <= DATE '2015-05-01') AND (STUDENT101.t1.row_end_date >= DATE
     '2015-05-01')") into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 1 row (45 bytes).  The estimated time for this step is 0.04
     seconds.
  2) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.04 seconds.

This might help with what you want to do.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: What's the best way to Partition for BETWEEN?

Hmmmm INTERESTING! I'll do some checking over here too on that idea. Thanks for posting it!