I have one question about row partitioning in Teradata while doing physical desining of tables. What are the crieterias to consider when we deicde if a table needs row partitioning or not ? One crieteria is to consider the number of rows in a table. Say if a table will have more than 1 million rows, the table should be partitioned ? Or should we consider the size of table ( number of rows * size of each row) to be the crieteria ? Say if a table is more than 10GB, partitioning should be considered? If we partition a small table, what are the overheads? In all these tables, i am considering a date field to have partition on. That date column is generally used in either '=' or '>=' conditions in queries.
I read through some documentation, but there is no cear cut crieteria given to decide on this. Any detailed material would really help.
People usually use row partitioning to make the most frequent types of queries more efficient. For example, if 90% of all queries only need to look at this month's rows (or any one month's), then it makes sense to partition by month. The Optimizer will recognize that all the rows it needs are in one paritition and won't bother scanning the others. Likewise, if a query wants to compare data between last month and 12 months prior, the Optimizer will know it only needs to scan two partitions.
Partial column partitioning is a similar choice to isolate less-frequently used columns.
Thanks for the reply. As i understand, it totally depends on how we access the table.
So, based on what should we consider to partition or not partition a table i.e number of rows or size of the table ? That's wht my original question was about.
The size of the table (number of rows*size of each row, that's how we calculate it) doesn't have any impact on this ? Should we consider any threshold (say 10GB ) below which we should not consider to partition any table ? Any insights would be helpful. I went through the manuals, but i couldnt get an exact answer tthere ?
For smaller tables, a full table scan would be used if partition is not used. So, would that be not efficient as well as the table is small ? partitioning also increases the row header size and PI-PI joins would also perform slow if PPI is not used in the join condition. So, what is the trade off here ?
You make a lot of good points, and I know of no formulaic answers, but if there were it would be based on the number of AMPs (or perhaps "T-Cores") in the system. For instance, 10GB may sound small to most of us, but maybe not if there are only 8 AMPs. And yes there are tradeoffs like PI join considerations. Usually when a physical design choice is not obvious in a particular case, people test with the choices available and learn empirically which works best. Sometimes that test even has to be done in production - with careful monitoring of course, and with a fallback plan.