Partitioning with several columns

Database
Enthusiast

Partitioning with several columns

Hello,

We are designing a new table which stores monthly information of several Datamining models.

The table is like this:

Month
Model
Customer ID
A lot of detail Fields

The table's PK is Month, Model, Customer ID and we perform a monthly load of the information.
We plan to build an UPPI index on Month, Model and Customer ID partitioning the table by Month and Model beacuse all the queries launched aganist the table use the Month and Model in the where.

We want to know if is possible to do this as we don't have write access to the server.

Thanks in advance
6 REPLIES
Enthusiast

Re: Partitioning with several columns

You can't have a multi level partitioning unless you are in TD12.0

...well you can derive an expression from two columns and partition on it say something like (mnth MOD 12)*100 + (model mod 100) + 1 even pre-12.0 (The TD12.0 kind of does similar algorithms internally to derive a single consolidated partition number, but it's a lot better deal)...... so that's kind of "multi level" partitioning, the pre-12.0 version

but it can get real dirty, so don't go there unless you know what to expect out of it. (like ***ALL*** the queries are providing month and model in the filter).. also when you cook up an expression like that, you can't "drop" old partitions, you must "delete" them... may not be a big hassle, but just to mention it.

Why can't you just partition on the month ?

Have you considered using an multi column NUSI constituing of month and model ?

Try a few partition schemes, with/without indexing against a variety of sqls and see which one suites best.

Enthusiast

Re: Partitioning with several columns

Thanks for the answer. After discussing the issue with the dba, we are not going to partition. We were going to load about 20 million rows per month with a history of 24 months. The dba said that those volumes were not very high and partitioning was not noeeded.
Enthusiast

Re: Partitioning with several columns

with 24 months of data , and some estimated grow per month on this 20 million, you would reach a point in 2 yrs that your table at any point would have about half a billion records. And there could be a gradual performance degradation spread across the month as the amount of data grows (whether this is going to be really bad or manageable depends on the number of AMPs in your system... more amps are good). This is because the new records would start getting spread across a large number of data blocks with growth, resulting in a hight amount of IO....

Probably technology would change by then that you wouldn't have to worry about it ;)
Enthusiast

Re: Partitioning with several columns

Thanks joe,

We have noticed the problem of the half billion records in 24 months, in the early design stage we offered a solution storing one month in one table (TABLE_YEAR_MONTH) and creating a view joining all the tables for a few queries related to deep analisys and data mining that are launched monthly and doesn't need a fast response.

We made a sample and the results were good with the deep queries having a performance 15% better that in the current Oracle system, 40% better in the conventional SQL launched for every month and 70% improvement in load times. The user validated the results and liked the solution but the IT department didn't like telling vague reasons such ( too many tabes in the system, dynamic SQL...) and forbade us to do such solution. :-(
Junior Contributor

Re: Partitioning with several columns

"in the early design stage we offered a solution storing one month in one table"

This is partitioning by month, too, it's just done manually instead of automatically.
If you use a PPI by month you'll get partition elimination and then it's the same as accessing a monthly table, you just don't have to think about the right tablename.

Dieter
Enthusiast

Re: Partitioning with several columns

Hello,
I have been looking for answers to a basic question on table partitioning in Teradata. Can someone please shed more light. Here it is -

If we do Range Partitioning on a table based on Numeric values and then different jobs load data corresponding to different partition ranges in parallel - then do we get into a resource contention scenario OR do all parallel running jobs hit specific partition and only acquire lock on a particular partition. In short - the question is - while loading data into a partitioned table [in this case Range Partition], do we acquire a lock at the partition level and multiple programs can run in parallel acquiring lock at individual partitions OR the lock is acquired at whole table level and the parallel jobs get into a contention mode.

Ex: An error table where each program loads specific error information that is encountered based on error code. If the table is range partitioned on error codes of range of 1000s, will individual programs when run in parallel hit their specific partition acquiring only lock to it or they will acquire complete table lock and release them for subsequent jobs when error load is done.

Please advice.