Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

Workload Toolkit – Part 2 – Analyze Secondary Index Usage

In Part 1 of this series, we looked at the Excessive use of String Manipulation Verbs in a query. Here is that link in case you missed it:

http://developer.teradata.com/database/articles/workload-toolkit-part-i-excessive-use-of-string-mani...

For Part 2 of this series, we will look at the implementation of secondary indexes and what impact (or lack of) they may be having.

We use secondary indexes on tables to increase efficiency in extracting just the data needed (qualification in where clause) or to support Primary Key to Foreign Key joins in queries.

The columns requiring indexes are based on the analysis of the various workloads run against the data, but I have found that if we look at tables of a certain size, that have no secondary indexes, these tables are usually fact (or base) tables that potentially have not been optimized for joins or qualification.

If you identify these and then look at the reference tables joined to them, you can usually find some “big bang” tuning opportunities.

This query identifies tables of 10GB or larger that do not have a secondary index defined (you need to define what is “big” at your site. You can start with these larger tables and work your way down in size.

SELECT pti.DatabaseName,
pti.TableName,
pds.CURRENT_PERM,
SUM(CASE WHEN pti.IndexType NOT IN ('P','Q','K')
THEN 1 ELSE 0 END) SEC_INDEX_CNT

FROM DBC.INDICES pti,

(SELECT databasename,
tablename,
SUM(CurrentPerm) CURRENT_PERM

FROM DBC.tablesize

GROUP BY 1,2

) pds

WHERE pti.DATABASENAME IN ('AAAA','BBBB','CCCC','DDDD')

AND pti.DATABASENAME = pds.DATABASENAME
AND pti.TABLENAME = pds.TABLENAME
AND pti.columnposition = 1

HAVING SEC_INDEX_CNT = 0

AND CURRENT_PERM > 10000000000

GROUP BY 1,2,3
ORDER BY 1,3 DESC, 2,4
;

Here are sample results of running the query. Looking at these large tables, especially the ones in the hundreds of GB range, what do you think the odds are that they are being either joined to (PK – FK relationships) or qualified on columns such as dates, etc? These are all opportunities for secondary indexes.

DatabaseName                     TableName                                CURRENT_PERM   SEC_INDEX_CNT
============================== ============================== ==================== =============

AAAA TABLE-A 19,040,798,720 0
AAAA TABLE-B 18,537,593,856 0

BBBB TABLE-C 191,052,663,296 0
BBBB TABLE-D 113,774,842,368 0

CCCC TABLE-E 963,304,786,432 0
CCCC TABLE-F 389,069,226,496 0

DDDD TABLE-G 132,060,960,256 0
DDDD TABLE-H 96,733,093,376 0

Remember, the methodology for determining if a secondary index is useful is as follows:

1) Try collecting statistics on the column. Run an explain and the query to gauge the results

2) Try adding a secondary index on the column. You have multiple index types to choose from now, including but not limited to:

Regular Secondary Index

Value Ordered Secondary Index (great for dates used in range conditions)

Hash Index

Single Table Join Index (usually used instead of Hash now)

Once again, after applying the secondary index, run an explain and the query to gauge the results. If the results are better with the index, it is worth keeping. If not, remove the index and index statistics.

3) This leaves you with just the statistics on the column. If this is still making the results better, keep it, if not, remove the statistics.

The removal or cleanup of non-beneficial indexes and statistics after testing is important. Too many times, I run into sites where statistics/indexes were tested, provided no real benefit, but left on because the developers did not follow through on the removal. This increased statistics collection runtimes, for no benefit.

 Hopefully this will help you to identify and focus on some missed tuning in your warehouse and increase your performance.

Good Luck!

Dave

7 REPLIES
Enthusiast

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Your analysis on SI usage is good. Can you give me an idea/SQL to identify unused NUSIs based on DBQL data?
Teradata Employee

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Yes, if you enable object level logging, it will capture the indexes that are used in the dbql data. Indexes that are never referenced, are not used. You need to make sure this runs for a sufficient amount of time, to make sure the captured workload is representative
Enthusiast

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Thank you, I have the Object level logging enabled but the AccessCount, LastAccessTimeStamp columns in DBC.Indices are showing NULL, did i miss something?
Enthusiast

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

May be it is too late and you might have figured it out but for others who might have this issue, here is what you should do. Enabling only Object level logging will not do it. You have to enable the ObjectUseCountCollectRate in the DBS Control Setttings. This will be in the general field. Here are the steps how you do it:
1. Once logged into the node with the right privilege type DBSCONTROL
2.you will see all the groups and fields but focus on GENERAL for number 30 which it says =0 (Minutes, Disabled) if not enabled.
3. then type this way - MODIFY GENERAL 30=10 and hit enter ( it locks the DBS and updates the DBS control GDO
4. it will only be effective after you issue the command WRITE. At this point of time the changes are made.
YOU DO NOT NEED TO DO ANY TPARESET.
Changes will be effective once 600 seconds (10 minutes as mentioned above) is met or the cache filling. Hope this helps.
Enthusiast

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Hi everyone

i also have a similar question that how can i get the count for indices used or not

@single amp did the solution WasuKillada mentioned help?

Enthusiast

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Hi everyone

i also have a similar question that how can i get the count for indices used or not

@single amp did the solution WasuKillada mentioned help?

Teradata Employee

Re: Workload Toolkit – Part 2 – Analyze Secondary Index Usage

Object level logging will populate the dbc.DBQLObjTbl for all object types referenced. You can look for object type IDX (I beleive) to see used indexes. Cross referencing this back to the dbc.indexes table will show you which ones are used as I mentioned previously.

Dave