Some of you may be familiar with relational databases other than Teradata and how those other RDBMs utilize indexes.
In Teradata, an “Index” is a physical mechanism that is used to distribute, store, and access data rows. Indexes provide a physical access path to the data and their use can avoid unnecessary full-table scans to locate rows.
To level set, let’s first consider the difference between a Key and an Index. A “Key” is a relational term. The Primary Key of a table is a column set that uniquely identifies a row in a logical table. A key is an identifier and not a physical mechanism.
In Teradata, there are four main types of indexes: Primary Indexes, Secondary Indexes, Join Indexes, and Hash Indexes. Each of these types of indexes have their own distinctive flavors and uses.
All Teradata Database tables require a Primary Index because the system distributes table rows to the AMPs based on primary index values. (* Teradata 13.0 has a NoPI feature for loading staging tables.) To accommodate Teradata’s massively parallel architecture, indexes use a hashing algorithm based on data row hash values as the most efficient means of distributing and retrieving data.
Primary indexes can be either unique or non-unique and partitioned or non-partitioned:
Partitioned primary indexes may have a single partitioning expression or multiple partitioning expressions:
The main purposes for Teradata’s Primary Indexes:
A trade-off that must often be considered is between the needs of full table scan queries and queries that can be range constrained in some way. For example, between strategic queries that need to scan a multi-year history of sales, and tactical queries that only need transactions from, say, the last 60 days. The compromise that’s often made is to break the large table into a number of range partitioned tables which work better for tactical queries than scanning one big table, and to UNION the partitioned tables together for strategic queries that need all the data. But breaking up what should be one table into multiple tables makes them harder to manage and maintain, and although it improves performance for tactical queries, it is not the optimum solution. Up to 64 columns can be specified for a primary index definition.
Partitioned Primary Index (PPI) is a table organization scheme that very elegantly enhances the existing Teradata structures and helps remove this trade-off dilemma. It’s an optional extension to Teradata’s hashed primary indexing that adapts it to more efficiently handle range queries.
What if, instead of having to fully scan a large table to satisfy a query we only had to scan 50% of it? What if we only had to scan 10%….or perhaps even less? This would have a huge, very positive impact on performance, and this is what PPI has been designed to address, to push the envelope for Teradata users who want to see better and better query performance whether it’s for tactical or strategic requests.
PPI has also been designed to be very easily set up and managed and to put a minimum additional burden on the DBA in keeping with our philosophy of low cost database management.
Secondary Indexes provide alternate access paths to the data and may be Hash-ordered or Value-ordered. Secondary indexes may be added or dropped as needed with the caveat that building them requires some amount of system resources and you’ll want to check with your DBA appropriate.
Join Indexes offer a variety of six subtypes that include:
Join Indexes are highly recommended for decision support applications because they often provide superior performance to large table joins and aggregate computations. Any Join Index, whether simple or aggregate, multi-table or single-table, can be sparse. The create statement of the index uses a constant expression in the WHERE clause of its definition to narrowly filter its row population. A caution when choosing Join Indexes is that neither MultiLoad nor FastLoad utilities support tables with join indexes. If Join Indexes are needed, a possible workaround could be to FastLoad data into an empty table and then either MERGE or do a INSERT/SELECT into the table with the Join Index.
Hash Indexes are a similar to Join Indexes and have a narrower usage. Hash Indexes are limited to one table and the table’s Primary Index cannot be partitioned.
Hash indexes are useful for queries where the index contains the columns referenced by a query, which will allow the Optimizer to use the index to satisfy the query rather than the underlying base table. Hash indexes can also be defined on a table in place of traditional secondary indexes. Like the Join Index, FastLoad and MulitLoad do not support Hash Indexes.
Criteria for choosing Join Indexes vs NUSIs
The similarities between STJI and NUSI are that STJIs can be defined with the same columns as NUSI; the concept of index covering (the query can be satisfied by columns in the index without accessing the base table) applies to both STJI and NUSI; value ordering is available on both STJI and NUSI; and joins to either STJIs or NUSIs are possible. Teradata does not do partial index retrievals, however, it can use either STJIs or NUSIs to do Full Table Scans of their subtables instead of a FTS of the base table.
The basic differences between STJI and NUSI are that a STJI is similar to a table with a primary index with additional columns defined; a STJI row can be stored on the same AMP or a different AMP as table data row whereas NUSIs are stored on same AMP as table data row; and NUSIs are supported by MultiLoad, but not STJIs.
Advantages of Indexes
Indexes are a retrieval mechanism and the intent of indexes is to lessen the time it takes to retrieve rows from a database and eliminate full table scans.
Disadvantages of Indexes
This is where too much of a good thing can be a disadvantage. Whenever a base table row is updated, deleted, or inserted, Index subtables must be also be updated. The more secondary, join or hash indexes you have, the more maintenance that will have to be performed. All Teradata secondary, hash, and join indexes are stored in subtables and will require extra storage space. Some of the indexes are “allergic” to some Load Utilities so you’ll want to check on what may be impacted as a result of index choices.
A good practice is to use the EXPLAIN before you execute a query will help you determine which indexes are being used for your query.
One of the most important tasks of a DBA is to choose Indexes, CHOOSE WISELY!