I've to choose candidates for PPI for about 80 tables.Can somebody let me know the criteria for considering a field as a PPI candidate like below
1) Table size
2) Fields used in WHERE condition etc..
Thanks in advance.
You need to know your data demography, size. Choose the partitioning columns, no. of partitioning
expressions, no. of partitions level. You may choose range , case and in that check carefully
when you can use NO CASE OR UNKNOWN,NO RANGE OR UNKNOWN too based on data demography. The whole idea is to gain benefit of performance suiting business or end user's requirements especially when you add in where conditions in queries.
You can have a look at dbc.tablesize
If partitioning cols are not part of the PI then PI access, join and aggregation queries may be degraded,
while partition elimination may improve other queries.
It reminds me of this link too: http://forums.teradata.com/forum/data-modeling/ppi-table-having-a-column-for-same-pi-and-si . Of course SI will be an overhead of space though. See the tradeoffs.
You can choose a date data type, usually range. If joining with other tables, you can think of perspectives ---advantages of joining with PI of another table.
You can get some pointers from DBQL table, index wizard. Finally it is good to collect stats.
It is always good to check explain :)
PPI on 80 tables? Feels like an over-kill. What is the total tables count?
How much data is your system been tested on? Are there any specific reports/area which requires performance improvement?
Couple of things I would add, Adding the PPI doesn't guarantee the improvement in execution time until and unless they are being used properly in most of the queries. So basically you need to analyze what sort of queries are being executed against the tables which you are targetting and see whether or not there could be candidates for PPI or not.
Generally the date columns are good candidates for partioning the data...
Thanks for the responses.
I'm not sure where to start with. I know the theory regarding PPI. However, when it comes for realtime implementation, I'm not sure where to start. If you can throw light on the points that need to be considered like 1) what should be the size of the table (should it be greater than 10 GB or so..)
As all mentioned the size should not be the deciding factor instead get the query list over a table and the where clause being used, the join criteria and the frequency of these happening. if the partition column is not used in the where clause, then it will be similar to doing a full table scan. Also there is another disadvantage with partitioning in place as in the example below.
Let us take that a table A has a PI defined on 2 columns col1 and col2. There is a partition defined on another column col3 which is not part of the PI. if i fire a query like sel * from A where col1=val1 and col2=val2, the query though having PI defined on only the 2 columns of the where clause, will have a poor performance than the same query fired on a non partitioned table. This is because of the step in the explain plan which scans through all the partitions. if it is a non-partitioned table then that would directly take advantage of the PI defined and the result will come out faster. This query would be faster and scan only a single partition or a group of partitions only when you fire the query like sel * from A where col1=val1 and col2=val2 and col3=(between) val31(and val32).
There is only one point to consider .... and that is .... you need to do some anaysis at your end on 'how' the tables are going to be used. And if there is some analysis on which partitioning can be done.
For example .... if there is a transaction table which is used to build some aggregates on a daily basis you can put a PPI on its date column, which will be used in aggregation script.
But for the cases, which will be many .... in which reporting needs are not on specific chunk of data rather whole data-set .... then you don't need a PPi for that case.
Again, try to understand data from reporting/usage perspective .... find out scenarios in which data chunks will be used rather than complete data-set and if that data can be partitioned .... use PPI.